Robot vacuum cleaners can spy on private conversations

When your robot vacuum cleaner does its work around the house, beware that it could pick up private conversations along with the dust and dirt. Computer scientists from NUS have demonstrated that it is indeed possible to spy on private conversations using a common robot vacuum cleaner and its built-in Light Detection and Ranging (Lidar) sensor.

The novel method, called LidarPhone, repurposes the Lidar sensor that a robot vacuum cleaner normally uses for navigating around a home into a laser-based microphone to eavesdrop on private conversations.

The research team, led by Assistant Professor Jun Han from NUS Computer Science, and his doctoral student Mr Sriram Sami, managed to recover speech data with high accuracy. NUS students, Mr Dai Yimin and Mr Sean Tan Rui Xiang, as well as Assistant Professor Nirupam Roy from the University of Maryland, also contributed to this work.

Mr Sami shared, “The proliferation of smart devices – including smart speakers and smart security cameras – has increased the avenues for hackers to snoop on our private moments. Our method shows it is now possible to gather sensitive data just by using something as innocuous as a household robot vacuum cleaner. Our work demonstrates the urgent need to find practical solutions to prevent such malicious attacks.”

The work was presented at the Association for Computing Machinery’s Conference on Embedded Networked Sensor Systems (SenSys 2020) on 18 November 2020, where the team clinched the Best Poster Runner Up Award.

How the attack works

The core of the LidarPhone attack method is the Lidar sensor, a device which fires out an invisible scanning laser, and creates a map of its surroundings. By reflecting lasers off common objects such as a dustbin or a takeaway bag located near a person’s computer speaker or television soundbar, the attacker could obtain information about the original sound that made the objects’ surfaces vibrate. Using applied signal processing and deep learning algorithms, speech could be recovered from the audio data, and sensitive information could potentially be obtained.

In their experiments, the researchers used a common robot vacuum cleaner with two sources of sound. One was the voice of a person reading out numbers played from a computer speaker, while the other source was music clips from television shows played through a television soundbar.

The team collected more than 19 hours of recorded audio files and passed them through deep learning algorithms that were trained to either match human voices or identify musical sequences. The system was able to detect the digits being spoken aloud, which could constitute a victim’s credit card or bank account numbers. Music clips from television shows could potentially disclose the victim’s viewing preferences or political orientation. The system achieved a classification accuracy rate of 91 per cent when recovering spoken digits, and a 90 per cent accuracy rate when classifying music clips. These results are significantly higher than a random guess of 10 per cent.

The researchers also experimented with common household materials to test how well they reflected the Lidar laser beam and found that the accuracy of audio recovery varied between different materials. They discovered the best material for reflecting the laser beam was a glossy polypropylene bag, while the worst was glossy cardboard.

Preventing such attacks

To prevent Lidars from being misused, the researchers recommend users to consider not connecting their robot vacuum cleaners to the Internet. The team also recommends that Lidar sensor manufacturers incorporate a mechanism that cannot be overridden, to prevent the internal laser from firing when the Lidar is not rotating.

“In the long term, we should consider whether our desire to have increasingly ‘smart’ homes is worth the potential privacy implications. We might have to accept that each new Internet-connected sensing device brought into our homes poses an additional risk to our privacy, and make our choices carefully,” shared Asst Prof Han.

Future work

The team is working on applying ideas learnt from LidarPhone to autonomous vehicles – which also use Lidar sensors – as they could also be used to eavesdrop on conversations happening in nearby cars through minute vibrations of the car windows. They are also looking at the vulnerability of active laser sensors found on the latest smartphones, which could reveal further privacy issues.

This post orginally appeared on

The post Robot vacuum cleaners can spy on private conversations appeared first on Web Science Trust.

Free Speech – American Style

by Kieron O’Hara

25th Jan, 2021
The dust is settling on the chaotic aftermath of the American election, and debate is opening up about free speech: in particular, was Twitter right to deny President Trump, as he then was, access, and was the tech world in general right to round upon far right platforms, notably pushing Parler offline? Even some of Mr Trump’s biggest enemies were concerned.
This is all complicated by our prior views on a number of topics: the positive and negative aspects of social media and the companies that run the networks, Schadenfreude or sympathy for Mr Trump, and the status of mob protest, as surprisingly many commentators are relatively sanguine about political violence when it comes from ideological directions with which they are comfortable. One academic concluded in 2013 that “the use of insurrectionary symbolic damage is a reminder of the failings of representative democracy in how it deals with political conflicts”, which is more or less the MAGA line, if perhaps less pithily expressed.
So, were Twitter and Facebook right to remove Trump’s platform? There are a number of ad hominem points to be made that don’t really affect the philosophical ones. Yes, Mr Trump did rather ask for it. No, the offending comments that were the last straw were nowhere near as incendiary as some that he had made previously without anyone at Twitter worrying about them. No, it is not coincidental that Mr Trump was banished after he was confirmed as the loser of the 2020 election, and so had become the lamest of lame ducks.
What the imbroglio does show is the peculiar mix of ethics, law and politics that makes it hard to translate American moral discourse into foreign contexts. It particularly matters when we consider Internet rights, because the US contains highly local ideological framings of Internet governance, as Wendy Hall and I have written, and describe in a forthcoming book, Four Internets.
This is an American argument, pitting an American company against an American individual who happens to be a businessman and politician of some prominence in America. Hence the context is not the broad issue of whether and when people should be allowed to say what they wish, but specifically the First Amendment of the US Constitution. This states that Congress should not pass any law that abridges freedom of speech or the press.
Note first that this applies only to Congress; as with most of the US Constitution, it is intended to protect private citizens from the government, not each other. So on the face of it, the First Amendment is, unlike many of today’s commentators, silent on the topic of Twitter and Trump. Twitter is a private actor, and not the direct target of the Amendment.
But is the government obliged to ensure Mr Trump is heard? After all, freedom of speech is surely abridged if people are denied platforms from which to speak. Perhaps the government’s responsibility is to prevent such abridgement on its territory.
A private person of course has preferences. Suppose someone stood on a soapbox in your back garden and started spouting views of which you disapproved. You would want the power to stop him, and rightly. And to complicate the argument, this power is understood, legally, in the US as your free speech – your free speech rights extend to your control of the speech that emerges from of your territory. If you owned a company, then you could close down the speech of an employee who used your communications to say something of which you disapproved, although it would be different if they used their own media.
Twitter, it is plain to see, is not a person. But this truism is less relevant, because in US law, corporations have relatively prominent legal personalities compared to other jurisdictions. Naturally, Twitter has no opinions, but it does have business interests, and any business might well wish to suppress statements that could damage its interests.
What if you were director of a company that owned a shopping mall, and wanted to prevent someone going into the mall and criticising the shops that rented space, directly outside their doors. Whose free speech counts now – yours, or the protestor’s? Is a shopping mall a public or a private space in the relevant sense? The police are called, and the government has to decide whether to defend the protestor’s right to speak ill of your clients, or your right to throw him out. It turns out that in the US, the police defend your rights, and not the protestor’s. Once the protestor is on the (public) pavement, then his rights take precedence, but not inside the (private) mall.
What about a privately-owned communications company? Can a mail service refuse to take a letter, or a telephone company a call? No, the government will defend the rights of the protestor in that case. The services are so-called common carriers, obliged to take the communications of anyone willing to pay the price, but as a quid pro quo not held liable for the speech they carry, be it libellous or hateful. It is an infringement of free speech if a common carrier will not take your communication, and the government’s obligation is to ensure the protestor gets heard.
So now comes the question: is Twitter more like a shopping mall, or a mail service? On the one hand, it is sheltered from any liability that would follow from what is posted upon it (unlike, say, a publisher, which would be liable for incitements to violence it published), as an “interactive computer service” in the terms of the 1996 Communications Decency Act (but not a common carrier). On the other, the US Supreme Court has recently tended, in cases of this kind, to defend the freedom of private entities such as Twitter to suppress speech on their media of which they do not approve (even if only to protect their business interests), rather than to order government to defend the freedom of speech of those who would wish to use private media.
So Twitter’s calculation was legally and economically hard-headed. Cancelling Candidate Trump, or President Trump in his pomp would have had repercussions – indeed, in an argument with Twitter, Mr Trump, in tandem with many other Republicans and Democrats, even threatened to repeal the aforesaid Communications Decency Act, which could have killed off Twitter, and many other social media, entirely. But giving Ex-President Trump a platform, especially after his loser status had been confirmed by Congress, and after his support had been undermined still further by what looked remarkably like a failed coup d’état, could be even more dangerous, especially as Mr Trump is likely to turn his post-election wrath on the Republican opposition as well as the Democratic government (i.e. everyone). After the riot, and after the confirmation of the electoral college result, the calculation changed. And Twitter’s calculation is what counts.
Is it a correct one? Probably. Although its share price fell upon Mr Trump’s defenestration, during the course of 2020 more and more advertisers prevented their ads appearing alongside his increasingly unhinged witterings. Twitter seeks to maximise engagement. As Commander-in-Chief, Mr Trump’s tweets were certainly engaging; now he is just another alt-right troll, maybe not so.
Facebook made a different calculation. It dropped Mr Trump as did Twitter, for the same reasons, but it is a more global company, and it needs to operate in contexts where free speech judgments carry less legal baggage. Consequently its ex-Eurocrat Vice President Sir Nick Clegg, who handles its international PR, has sent the decision to its Oversight Board for confirmation. This will happily delay the decision for long enough that, whichever way it goes (and it will find in Mr Trump’s favour), the heat will be drawn from the American political situation. Yet, although the Board will produce a piece of philosophical argument expressed in the most highfalutin prose, the decision itself – indeed, the very existence of the Oversight Board – remains a business decision, a hard-headed calculation of the long term interests of Facebook.
Finally, it is worth pointing out that Facebook’s Oversight Board has given it a number of advantages over Twitter’s command and control. It functions as the long grass into which the problem has been kicked, while simultaneously allowing the removal of Mr Trump during this tense period. And it gives the illusion of making a moral decision independent of the specifics of American politics, American business and American law. Sir Nick won’t have to apologise after this process is complete.

The post Free Speech – American Style appeared first on Web Science Trust.

“Data are” or “data is”? A pedant writes

by Kieron O’Hara

It is one of the divisive questions of our times. Is the word ‘data’ singular or plural? Some say “this data is …”, “the data doesn’t tell us …”; others “these data are …”, “the data don’t tell us …”.

The singular use, often heard in computer science departments and probably more commonly in popular speech, treats ‘data’ as an uncountable noun or singular mass noun, like ‘water’ or ‘education’. It therefore has no plural – we don’t speak of ‘the datas’, any more than we speak of ‘the educations’ or ‘the waters’ (actually, ‘the waters’ is a usable term, but specificallyrefers to a source of spring water: ‘I came to Casablanca for the waters’). Such nouns only take plurals when combined with a specific unit of measurement.

The plural use, more often heard in social science and statistics departments, says that ‘data’ is the plural of ‘datum’. A datum is something like ‘x = 40’, and if we create a file containing the datum ‘x = 40’ and the datum ‘y = 50’, we have data.

There are two common views of the rights and wrongs of this.One is that either is OK. As long as use is consistent and you make yourself understood, it doesn’t really matter. The other is that the plural use is correct, because ‘data’ is a Latin word, the plural of ‘datum’, which means ‘the given’. You show your ignorance of the classical heritage of English if you misuse the word.

I used to hold the first of these views. Clearly, the roof won’t fall in if we preserve both uses, because they are both conventional, more or less. It’s bad practice to have both uses in the same piece of work, so having chosen one convention, stick with it for good style, but don’t angst about it. Personally, I tended to use ‘data’ as a singular mass noun, but I didn’t hold it against others who didn’t.

However, having been ticked off often enough by holders of the second of these views, I reflected upon it, and I now think the weight of argument is in favour of a third view: that ‘data’ is a singular mass noun, and that the plural use, even if verified by Cicero himself, is simply incorrect.

We can take into account four considerations. None of them in itself is decisive, but cumulatively I believe that at a minimum they put the burden of proof on the pluralists. If you are either a pluralist or an agnostic, you need counterarguments.

1. You are not speaking Latin.

‘Data’ is a word of English. It happens to have a homographin Latin, because we borrowed the word. There are many other English words with this property, from ‘abdomen’ to ‘vomit’, and we don’t worry about their Latin grammar.

When we use words from other languages in English sentences, we often write them in italics, and then we do worry about their grammar. After all, if you are showing off, you had better show off correctly. For instance, hoi polloi is a Greek phrase meaning ‘the people’, or ‘the many’, and figuratively ‘the rabble’. ‘Hoi’ is a definite article, so it is incorrect to say ‘the hoi polloi’, because that means ‘the the rabble.’ There are lots of other phrases we borrow, and we have to get them right – for instance, the correct plural of the French phrase ‘fait accompli’ is ‘faits accomplis’, even if we use it in an English sentence.

‘Data’ is not like this. It is a word of English that should behave like an English word. That does not tell us, of course, whether it is a singular mass noun or a plural, but it does tell us that the Latin rules are irrelevant.

2. You are being inconsistent

There are other singular words in English which began as Latin plurals. ‘Agenda’ is one of them – it is the Latin plural of ‘agendum’, that which is to be done. Yet no-one in their right mind uses ‘agenda’ in English as a plural. No-one says ‘The agenda are on a slide, and I’m projecting them onto the screen.’ Absolutely everyone, even a data-pluralist, says ‘The agenda is on a slide, and I’m projecting it onto the screen.’But if it is acceptable to treat ‘data’ as an English plural, why not ‘agenda’?

3. What do the French do?

Ah, the pluralist might say, others treat their words for ‘data’ as plurals. The French say ‘les données’, while the Dutch ‘de gegevens’. We should take a lesson from them.

Ah, I reply, but they have not absorbed the Latin word. Rather, they have both translated it as ‘the given’. Which is fine – we certainly do not want to dictate to the French and Dutch how they should communicate. But note that when we translate Latin ‘data’ into English, we get a singular term, ‘the given’. They get a plural term, and hence their words for ‘data’ are plural. Had we followed their strategy, we would have got an unambiguously singular term, and we would talk about ‘the given’, ‘givenbases’, ‘big given’, ‘metagiven’ and so on. No-one would ever say ‘the givens.’

4. What do the English do?

This last point trades on something about the way that English treats abstractions such as ‘the given’ or ‘the data’. The singular tends to be used. And we can see this if we compare ‘data’ with words of similar function in English, such as ‘information’, ‘knowledge’ and ‘wisdom’. These are also singular mass nouns – if we add more knowledge to our (singular) knowledge, we end up with (singular) knowledge.This kind of abstraction over semantic/epistemological concepts requires, in English, if not in Latin, French or Dutch, a singular grammar.

Not only should we not be surprised to find that ‘data’ acts in the same way, it will actually lead to greater conceptual clarity – in English – if we reject the pluralist view and the agnosticview, and accept the singular view.

It should be said that the word ‘datum’ is also useful – we do sometimes want to refer to a single item, and ‘datum’ will do for that, as well as alternatives like ‘piece of data’ and ‘datapoint’. In the same way, we talk of a ‘piece of information’, an ‘item of knowledge’, a ‘nugget of wisdom’. I don’t rule out the use of ‘datum’, only the mistaken view that in English it is the singular of ‘data’.

One of the penalties of being a pedant is that, despite your being right, no-one cares, and I don’t suppose you do. But anyway, it is off my chest now, and I can go on with the rest of my life.

The Web Science Blog invites opinion pieces from academia, government and business and, whilst hopefully informative and entertaining, these pieces do not necessarily reflect the opinions of the Web Science Trust, its members , staff or trustees.


Recent perspectives on VC

As video conferencing (VC) has become the new normal, businesses, government services and academia are starting to confront what it means if VC (Zoom, Skype, FaceTime et al) become the default method (and in some cases currenty the only method) to allow “live” interaction between colleagues, service users/customers and students.

The response and fall-back from live to VC was quickly embraced but the jury is out on whether this is sustainable as the normal. Below are several perspectives on life and work on zoom.

The Netflixisation of academia’: is this the end for university lectures?

Zoom fatigue is real

Working remotely? Who owns all the content you are posting online?

Who is Emma and is she F.A.K.E. (Finnish Academy of Knowledge Engineering)

WSTNet Student Profile: Emma Heikkinen

Emma completed her PhD in Web Science at the Finnish Academy of Knowledge Engineering and decided to move into Web Science to combine her interest in Web Design with a stronger understanding of Web technologies.

We spoke to Emma about how she got into Web Science and what she is doing now having moved to London since getting her PhD:

I’ve always had an interest in web design and I have published my first (published) article, but I’m not really that interested in the technical aspect of it and didn’t really have a sense of how web developers and designers worked.  

But the idea of the city was very appealing and I realized that the web design section of Google’s (now Alphabet) employee site may be the best place to get your understanding about the web. Even though I am not a senior employee, I have received an offer of promotion to engineer at Google.

So what do you think is Emma F.A.K.E?

For those more eagle-eyed amongst you – you make have smelled a rat when we suggested that Emma did her PhD at the Finnish Academy of Knowledge Engineering (Hint: FAKE) or perhaps you knew (or checked!) that there is no WSTnet lab in Finland (yet).

What you may not realise is that Emma, her picture and her words were entirely AI generated.

If you did not take the time to check or review this piece, perhaps you would still believe Emma and the F.A.K.E. were real – you might even share your belief with other colleagues commenting on “how the Finnish are into Knowledge Engineering” – its easy to get pulled into the lie. Google may index this page, your comments etc and – without an understanding of the purpose – may further establish Emma and FAKE as entities we can search for and get search hits on. The fake can become self-reinforcing. 

How was this example created?

Emma’s photograph was generated on the site thispersondoesnotexist (more specifically the image is “imagined” by a GAN (generative adversarial network StyleGAN2 (Dec 2019) – Karras et al. and Nvidia ). For those of you who are uninterested in convincing fake pictures of students the AI can also generate fake horses, fake cats, fake artwork and fake chemical substances. One must consider what impact the presence of unlimited (royalty-free) faces will have on those currently in modelling jobs who generally expect to get paid for their images. Are there grounds to be paid for part of the orginal images that have been used to create the new ones? One again the law may be trailing far behind technology. 

Emma’s words were generated in GPT-2 (OpenAI’s text generator) and you may have been surprised at how convincing the AI’s output can be.  For a comparison I also tested InferKit (a hosted neural net service) and “asked” Emma  (i.e., I seeded the algorithm with)

“I studied Web science ” and recieved the answer:

“I studied Web science for six years, trying to understand what’s happening (to them). What’s behind the still-growing popularity of the Web? What makes people want to use a site over a competitor’s? How much real estate is being wasted on things like links and robots.txt?” 

A free version of InferKit is available here

We stopped short of generating a deep fake for Emma (for reasons of time/budget) but there are plenty of examples to be found including various fake Obama speeches.

More recently even smartphone apps (including the chinese app Zao) have raised serious questions not only about the ability to quickly generate deepfakes but, more significantly, the rights and ownership of the resulting face scans which users may be inadvertently giving away to third party companies. You may recall Facebook being repremanded for creating messages/posts that appeared to be from friends which recommended products and services – one can imagine how this might be extended by bad actors to generate deepfake messages encouraging users to click on malicious content.

This serves to underscore how much more important is it becoming to develop tools and processes that can detect the presence of bots, fake text and particularly deep fakes and how important the education of users remains to guard private information and images – particularly of your own face!.