[ad_1]
For the past two years, Facebook AI Research (FAIR) has worked with 13 universities around the world to assemble the largest ever first-person video dataset specifically to train deep learning image recognition models. AIs trained on the dataset will be better at controlling robots interacting with humans or interpreting images from smart glasses. “Machines will only be able to help us in our daily lives if they truly understand the world through our eyes,” says Kristen Grauman, who leads the project from FAIR.
This type of technology can support people who need help at home or guide people through tasks they are learning to complete. “The video in this dataset is much closer to how people observe the world,” says Michael Ryoo, a Google Brain and computer vision researcher at Stony Brook University in New York, not involved in Ego4D.
But potential abuses are clear and alarming. The research is being funded by Facebook, a social media giant that was recently impeached in the US Senate. putting profits before people’s welfare—As approved by MIT Technology Review own investigations.
The business model of Facebook and other Big Tech companies is to extract as much data as possible from people’s online behavior and sell it to advertisers. The artificial intelligence outlined in the project can extend this reach to people’s daily offline behavior, revealing what objects are around your home, what activities you enjoy, who you spend time with, and even where your gaze lingers – to an unprecedented degree of personal information.
“Once you take that out of the world of exploratory research and into something that is a product, there’s work to be done on privacy,” Grauman says. “This work may even be inspired by this project.”

The previous largest dataset of first-person video consists of 100 hours of footage of people in the kitchen. The Ego4D dataset consists of 3,025 hours of video recorded by 855 people in 73 different locations in nine countries (USA, UK, India, Japan, Italy, Singapore, Saudi Arabia, Colombia and Rwanda).
Participants were of different ages and backgrounds; some were hired for visually interesting occupations such as bakers, mechanics, carpenters and landscape architects.
Previous datasets typically consisted of semi-written video clips that were only a few seconds long. For Ego4D, participants wore head-mounted cameras for up to 10 hours at a time and captured first-person video of walking down the street, reading, doing laundry, shopping, playing with pets, playing board games, and unscripted daily activities. interaction with other people. Some of the footage also includes audio, data on where participants’ gazes are focused, and multiple perspectives on the same scene. Ryoo says it’s the first dataset of its kind.
[ad_2]
Source link