Robot vacuum cleaners can spy on private conversations

When your robot vacuum cleaner does its work around the house, beware that it could pick up private conversations along with the dust and dirt. Computer scientists from NUS have demonstrated that it is indeed possible to spy on private conversations using a common robot vacuum cleaner and its built-in Light Detection and Ranging (Lidar) sensor.

The novel method, called LidarPhone, repurposes the Lidar sensor that a robot vacuum cleaner normally uses for navigating around a home into a laser-based microphone to eavesdrop on private conversations.

The research team, led by Assistant Professor Jun Han from NUS Computer Science, and his doctoral student Mr Sriram Sami, managed to recover speech data with high accuracy. NUS students, Mr Dai Yimin and Mr Sean Tan Rui Xiang, as well as Assistant Professor Nirupam Roy from the University of Maryland, also contributed to this work.

Mr Sami shared, “The proliferation of smart devices – including smart speakers and smart security cameras – has increased the avenues for hackers to snoop on our private moments. Our method shows it is now possible to gather sensitive data just by using something as innocuous as a household robot vacuum cleaner. Our work demonstrates the urgent need to find practical solutions to prevent such malicious attacks.”

The work was presented at the Association for Computing Machinery’s Conference on Embedded Networked Sensor Systems (SenSys 2020) on 18 November 2020, where the team clinched the Best Poster Runner Up Award.

How the attack works

The core of the LidarPhone attack method is the Lidar sensor, a device which fires out an invisible scanning laser, and creates a map of its surroundings. By reflecting lasers off common objects such as a dustbin or a takeaway bag located near a person’s computer speaker or television soundbar, the attacker could obtain information about the original sound that made the objects’ surfaces vibrate. Using applied signal processing and deep learning algorithms, speech could be recovered from the audio data, and sensitive information could potentially be obtained.

In their experiments, the researchers used a common robot vacuum cleaner with two sources of sound. One was the voice of a person reading out numbers played from a computer speaker, while the other source was music clips from television shows played through a television soundbar.

The team collected more than 19 hours of recorded audio files and passed them through deep learning algorithms that were trained to either match human voices or identify musical sequences. The system was able to detect the digits being spoken aloud, which could constitute a victim’s credit card or bank account numbers. Music clips from television shows could potentially disclose the victim’s viewing preferences or political orientation. The system achieved a classification accuracy rate of 91 per cent when recovering spoken digits, and a 90 per cent accuracy rate when classifying music clips. These results are significantly higher than a random guess of 10 per cent.

The researchers also experimented with common household materials to test how well they reflected the Lidar laser beam and found that the accuracy of audio recovery varied between different materials. They discovered the best material for reflecting the laser beam was a glossy polypropylene bag, while the worst was glossy cardboard.

Preventing such attacks

To prevent Lidars from being misused, the researchers recommend users to consider not connecting their robot vacuum cleaners to the Internet. The team also recommends that Lidar sensor manufacturers incorporate a mechanism that cannot be overridden, to prevent the internal laser from firing when the Lidar is not rotating.

“In the long term, we should consider whether our desire to have increasingly ‘smart’ homes is worth the potential privacy implications. We might have to accept that each new Internet-connected sensing device brought into our homes poses an additional risk to our privacy, and make our choices carefully,” shared Asst Prof Han.

Future work

The team is working on applying ideas learnt from LidarPhone to autonomous vehicles – which also use Lidar sensors – as they could also be used to eavesdrop on conversations happening in nearby cars through minute vibrations of the car windows. They are also looking at the vulnerability of active laser sensors found on the latest smartphones, which could reveal further privacy issues.

This post orginally appeared on

The post Robot vacuum cleaners can spy on private conversations appeared first on Web Science Trust.

“Data are” or “data is”? A pedant writes

by Kieron O’Hara

It is one of the divisive questions of our times. Is the word ‘data’ singular or plural? Some say “this data is …”, “the data doesn’t tell us …”; others “these data are …”, “the data don’t tell us …”.

The singular use, often heard in computer science departments and probably more commonly in popular speech, treats ‘data’ as an uncountable noun or singular mass noun, like ‘water’ or ‘education’. It therefore has no plural – we don’t speak of ‘the datas’, any more than we speak of ‘the educations’ or ‘the waters’ (actually, ‘the waters’ is a usable term, but specificallyrefers to a source of spring water: ‘I came to Casablanca for the waters’). Such nouns only take plurals when combined with a specific unit of measurement.

The plural use, more often heard in social science and statistics departments, says that ‘data’ is the plural of ‘datum’. A datum is something like ‘x = 40’, and if we create a file containing the datum ‘x = 40’ and the datum ‘y = 50’, we have data.

There are two common views of the rights and wrongs of this.One is that either is OK. As long as use is consistent and you make yourself understood, it doesn’t really matter. The other is that the plural use is correct, because ‘data’ is a Latin word, the plural of ‘datum’, which means ‘the given’. You show your ignorance of the classical heritage of English if you misuse the word.

I used to hold the first of these views. Clearly, the roof won’t fall in if we preserve both uses, because they are both conventional, more or less. It’s bad practice to have both uses in the same piece of work, so having chosen one convention, stick with it for good style, but don’t angst about it. Personally, I tended to use ‘data’ as a singular mass noun, but I didn’t hold it against others who didn’t.

However, having been ticked off often enough by holders of the second of these views, I reflected upon it, and I now think the weight of argument is in favour of a third view: that ‘data’ is a singular mass noun, and that the plural use, even if verified by Cicero himself, is simply incorrect.

We can take into account four considerations. None of them in itself is decisive, but cumulatively I believe that at a minimum they put the burden of proof on the pluralists. If you are either a pluralist or an agnostic, you need counterarguments.

1. You are not speaking Latin.

‘Data’ is a word of English. It happens to have a homographin Latin, because we borrowed the word. There are many other English words with this property, from ‘abdomen’ to ‘vomit’, and we don’t worry about their Latin grammar.

When we use words from other languages in English sentences, we often write them in italics, and then we do worry about their grammar. After all, if you are showing off, you had better show off correctly. For instance, hoi polloi is a Greek phrase meaning ‘the people’, or ‘the many’, and figuratively ‘the rabble’. ‘Hoi’ is a definite article, so it is incorrect to say ‘the hoi polloi’, because that means ‘the the rabble.’ There are lots of other phrases we borrow, and we have to get them right – for instance, the correct plural of the French phrase ‘fait accompli’ is ‘faits accomplis’, even if we use it in an English sentence.

‘Data’ is not like this. It is a word of English that should behave like an English word. That does not tell us, of course, whether it is a singular mass noun or a plural, but it does tell us that the Latin rules are irrelevant.

2. You are being inconsistent

There are other singular words in English which began as Latin plurals. ‘Agenda’ is one of them – it is the Latin plural of ‘agendum’, that which is to be done. Yet no-one in their right mind uses ‘agenda’ in English as a plural. No-one says ‘The agenda are on a slide, and I’m projecting them onto the screen.’ Absolutely everyone, even a data-pluralist, says ‘The agenda is on a slide, and I’m projecting it onto the screen.’But if it is acceptable to treat ‘data’ as an English plural, why not ‘agenda’?

3. What do the French do?

Ah, the pluralist might say, others treat their words for ‘data’ as plurals. The French say ‘les données’, while the Dutch ‘de gegevens’. We should take a lesson from them.

Ah, I reply, but they have not absorbed the Latin word. Rather, they have both translated it as ‘the given’. Which is fine – we certainly do not want to dictate to the French and Dutch how they should communicate. But note that when we translate Latin ‘data’ into English, we get a singular term, ‘the given’. They get a plural term, and hence their words for ‘data’ are plural. Had we followed their strategy, we would have got an unambiguously singular term, and we would talk about ‘the given’, ‘givenbases’, ‘big given’, ‘metagiven’ and so on. No-one would ever say ‘the givens.’

4. What do the English do?

This last point trades on something about the way that English treats abstractions such as ‘the given’ or ‘the data’. The singular tends to be used. And we can see this if we compare ‘data’ with words of similar function in English, such as ‘information’, ‘knowledge’ and ‘wisdom’. These are also singular mass nouns – if we add more knowledge to our (singular) knowledge, we end up with (singular) knowledge.This kind of abstraction over semantic/epistemological concepts requires, in English, if not in Latin, French or Dutch, a singular grammar.

Not only should we not be surprised to find that ‘data’ acts in the same way, it will actually lead to greater conceptual clarity – in English – if we reject the pluralist view and the agnosticview, and accept the singular view.

It should be said that the word ‘datum’ is also useful – we do sometimes want to refer to a single item, and ‘datum’ will do for that, as well as alternatives like ‘piece of data’ and ‘datapoint’. In the same way, we talk of a ‘piece of information’, an ‘item of knowledge’, a ‘nugget of wisdom’. I don’t rule out the use of ‘datum’, only the mistaken view that in English it is the singular of ‘data’.

One of the penalties of being a pedant is that, despite your being right, no-one cares, and I don’t suppose you do. But anyway, it is off my chest now, and I can go on with the rest of my life.

The Web Science Blog invites opinion pieces from academia, government and business and, whilst hopefully informative and entertaining, these pieces do not necessarily reflect the opinions of the Web Science Trust, its members , staff or trustees.


Recent perspectives on VC

As video conferencing (VC) has become the new normal, businesses, government services and academia are starting to confront what it means if VC (Zoom, Skype, FaceTime et al) become the default method (and in some cases currenty the only method) to allow “live” interaction between colleagues, service users/customers and students.

The response and fall-back from live to VC was quickly embraced but the jury is out on whether this is sustainable as the normal. Below are several perspectives on life and work on zoom.

The Netflixisation of academia’: is this the end for university lectures?

Zoom fatigue is real

Working remotely? Who owns all the content you are posting online?

Who is Emma and is she F.A.K.E. (Finnish Academy of Knowledge Engineering)

WSTNet Student Profile: Emma Heikkinen

Emma completed her PhD in Web Science at the Finnish Academy of Knowledge Engineering and decided to move into Web Science to combine her interest in Web Design with a stronger understanding of Web technologies.

We spoke to Emma about how she got into Web Science and what she is doing now having moved to London since getting her PhD:

I’ve always had an interest in web design and I have published my first (published) article, but I’m not really that interested in the technical aspect of it and didn’t really have a sense of how web developers and designers worked.  

But the idea of the city was very appealing and I realized that the web design section of Google’s (now Alphabet) employee site may be the best place to get your understanding about the web. Even though I am not a senior employee, I have received an offer of promotion to engineer at Google.

So what do you think is Emma F.A.K.E?

For those more eagle-eyed amongst you – you make have smelled a rat when we suggested that Emma did her PhD at the Finnish Academy of Knowledge Engineering (Hint: FAKE) or perhaps you knew (or checked!) that there is no WSTnet lab in Finland (yet).

What you may not realise is that Emma, her picture and her words were entirely AI generated.

If you did not take the time to check or review this piece, perhaps you would still believe Emma and the F.A.K.E. were real – you might even share your belief with other colleagues commenting on “how the Finnish are into Knowledge Engineering” – its easy to get pulled into the lie. Google may index this page, your comments etc and – without an understanding of the purpose – may further establish Emma and FAKE as entities we can search for and get search hits on. The fake can become self-reinforcing. 

How was this example created?

Emma’s photograph was generated on the site thispersondoesnotexist (more specifically the image is “imagined” by a GAN (generative adversarial network StyleGAN2 (Dec 2019) – Karras et al. and Nvidia ). For those of you who are uninterested in convincing fake pictures of students the AI can also generate fake horses, fake cats, fake artwork and fake chemical substances. One must consider what impact the presence of unlimited (royalty-free) faces will have on those currently in modelling jobs who generally expect to get paid for their images. Are there grounds to be paid for part of the orginal images that have been used to create the new ones? One again the law may be trailing far behind technology. 

Emma’s words were generated in GPT-2 (OpenAI’s text generator) and you may have been surprised at how convincing the AI’s output can be.  For a comparison I also tested InferKit (a hosted neural net service) and “asked” Emma  (i.e., I seeded the algorithm with)

“I studied Web science ” and recieved the answer:

“I studied Web science for six years, trying to understand what’s happening (to them). What’s behind the still-growing popularity of the Web? What makes people want to use a site over a competitor’s? How much real estate is being wasted on things like links and robots.txt?” 

A free version of InferKit is available here

We stopped short of generating a deep fake for Emma (for reasons of time/budget) but there are plenty of examples to be found including various fake Obama speeches.

More recently even smartphone apps (including the chinese app Zao) have raised serious questions not only about the ability to quickly generate deepfakes but, more significantly, the rights and ownership of the resulting face scans which users may be inadvertently giving away to third party companies. You may recall Facebook being repremanded for creating messages/posts that appeared to be from friends which recommended products and services – one can imagine how this might be extended by bad actors to generate deepfake messages encouraging users to click on malicious content.

This serves to underscore how much more important is it becoming to develop tools and processes that can detect the presence of bots, fake text and particularly deep fakes and how important the education of users remains to guard private information and images – particularly of your own face!.

European Parliament advised to build its own ‘European Internet’ to block services supporting unlawful activities

This article originally appeared in Computing 9th June 2020

A policy paper requested by the European Parliament’s committee on the Internal Market and Consumer Protection recommends the European Union to develop a “European Internet” which, like the “Great Firewall of China”, would block services supporting unlawful activities in other countries.

Many governments and human rights groups in Europe currently criticise the Chinese government for its use of a firewall that denies Chinese people open access to Internet for free exchange of information and ideas. Critics argue that this “Great Firewall of China” helps the Chinese government to suppress opposition to its one party system.

But, it appears now that policy makers in the EU have also started noticing some advantages of this approach.

“The EU should include an action plan for a digital cloud – a European Internet – in the DSA”, suggests the policy document [pdf] which is authored by experts from Hamburg-based consultancy Future Candy.

According to these experts, EU’s own firewall/cloud/ internet would help foster a digital ecosystem based on data and innovation in the European region. Unlike Chinese approach that enables Beijing to suppress democratic movements in the country, EU’s firewall would be founded on the pillars of democratic values, transparency, user friendliness, data protection and data accessibility. Moreover, it would also help in setting standards and driving competition in the region.

Foreign web services would be allowed to join EU’s digital ecosystem, but for that, they would need to adhere to the rules and standards set by the European Parliament.

The document further advises the parliament to take various measures ahead of the proposed Digital Services Act (DSA) that will eventually a directive introduced nearly 20 years back to govern online services in the EU.

The policy document recommends starting a funding programme for European firms to help build state-of-the-art eGovernment services. This funding project would invest money in start-ups and other firms that demonstrate a strong desire to create infrastructure and digital services to enable a digital world of government.

The policy paper also recommends building a Visionary Communication Programme that would include regular legislative updates of the DSA and would also inspire European citizens about digital developments going on the region.