Some top 100,000 websites collect everything you type—before you hit submit – Ars Technica

When you sign up for a newsletter, make a hotel reservation, or check out online, you probably take for granted that if you mistype your email address three times or change your mind and X out of the page, it doesn’t matter. Nothing actually happens until you hit the Submit button, right? Well, maybe not. As with so many assumptions about the web, this isn’t always the case, according to new research: A surprising number of websites are collecting some or all of your data as you type it into a digital form.

Researchers from KU Leuven, Radboud University, and University of Lausanne crawled and analyzed the top 100,000 websites, looking at scenarios in which a user is visiting a site while in the European Union and visiting a site from the United States. They found that 1,844 websites gathered an EU user’s email address without their consent, and a staggering 2,950 logged a US user’s email in some form. Many of the sites seemingly do not intend to conduct the data-logging but incorporate third-party marketing and analytics services that cause the behavior.

After specifically crawling sites for password leaks in May 2021, the researchers also found 52 websites in which third parties, including the Russian tech giant Yandex, were incidentally collecting password data before submission. The group disclosed their findings to these sites, and all 52 instances have since been resolved.

“If there’s a Submit button on a form, the reasonable expectation is that it does something—that it will submit your data when you click it,” says Güneş Acar, a professor and researcher in Radboud University’s digital security group and one of the leaders of the study. “We were super surprised by these results. We thought maybe we were going to find a few hundred websites where your email is collected before you submit, but this exceeded our expectations by far.”

The researchers, who will present their findings at the Usenix security conference in August,  say they were inspired to investigate what they call “leaky forms” by media reports, particularly from Gizmodo, about third parties collecting form data regardless of submission status. They point out that, at its core, the behavior is similar to so-called key loggers, which are typically malicious programs that log everything a target types. But on a mainstream top-1,000 site, users probably won’t expect to have their information keylogged. And in practice, the researchers saw a few variations of the behavior. Some sites logged data keystroke by keystroke, but many grabbed complete submissions from one field when users clicked to the next.

“In some cases, when you click the next field, they collect the previous one, like you click the password field and they collect the email, or you just click anywhere and they collect all the information immediately,” says Asuman Senol, a privacy and identity researcher at KU Leuven and one of the study coauthors. “We didn’t expect to find thousands of websites; and in the US, the numbers are really high, which is interesting,”