By Dr. Aydin Ulas, Data Scientist at Truata
The changing dynamics of the digital world have led to several privacy challenges for businesses, large and small. This is placing increasing pressure on them to evolve their processes and strategies. Much of the burden stems from the sheer volume of data present today, and this exponential growth is set to continue on its upward trajectory in correlation with the pace of accelerated digital transformation In fact, the volume of data is predicted to balloon to 175 zettabytes (ZB) by 2025. Today, it is simply beyond human capability to be able to effectively process and protect privacy without the assistance of privacy-enhancing technologies (PETs).
This has led to an explosion of adaptive machine learning (ML) algorithms that can wade through the mountain of data while continuously, and efficiently, changing their behaviour in real-time as new data streams are fed into them. However, while ML is key to leveraging and learning from big data at scale, it can create privacy challenges. In fact, traditional ML requires data to be stored on a centralised server for analysis which can include transporting data to cloud environments; this opens the doors to a plethora of security and privacy implications. As such, the technology has also met resistance from consumers who, despite preferring personalization when it comes to ad targeting, do not want to lose control of their personal data to feed that convenience. Worries over how data is being stored and used after being collected and transferred to a centralised location is also impacting digital trust and sparking cynicism over AI advancements that are being fueled by data.
Taking it to the edge
These privacy and security concerns have led the charge for ML technology that can work in a way that preserves consumer privacy, which is why federated learning (FL) has gained such momentum. Federated learning, put simply, is a decentralised form of machine learning. It is a method of training an algorithm on user data across multiple decentralised edge devices or servers without needing to exchange or transfer that data to a central location. This means that the data remains ‘sticky’ to a consumer’s mobile phone, tablet or laptop; however, it does pose the challenge of how to find a common representation over these devices.
With decentralised federated learning, a global model is generated in a central server and the data to train this model is distributed across edge devices. All edge devices use the model to compute updated parameters with their data and then transport these parameters to the central node. The central node then computes an aggregated parameter set from the parameters conveyed by edge devices and sends this back to the edge. This is a good compromise as the data stays with the owner, while still being used to create insights centrally. Bringing the mountain to Muhammad, if you will, the model is brought to the data where it can be trained/updated rather than the data having to go to the model. Federated learning is one of the best examples of the new breed of edge computing, where computation and data storage are brought closer to the source of data. In the case of targeted advertising, that source being the consumer themselves.
Looking to support a privacy-first future for web advertising, and protect its biggest revenue stream, Google is a leading proponent of the technology and has recently launched its Federated Learning of Cohorts (FLoC) as a replacement for traditional third-party cookies, which it plans to stop supporting by 2023.
With FLoC, Google groups consumers into cohorts based on their browsing history for the purpose of interest-based targeted advertising. FLoC is part of the company’s wider Privacy Sandbox initiative, which includes several other advertising-related technologies with bird-themed names. In a nutshell, the user’s browser uses an algorithm (developed by Google) to label a user (put the person in a “cohort”) and this label travels everywhere with the user. The algorithm could use any information available to the browser on the edge device to model/select the cohort; the initial proof of concept uses domains visited by the user. In order to keep privacy constrains in check, a central node is needed to handle cases where there are cohorts with a low number of individuals in them.
Spreading the load
While there are multiple benefits to federated learning, it does have certain limitations. Not least of which is the fact that the technology requires frequent communication between the nodes during the learning process to be able to work. Thus, it requires not only enough local computing power and memory, which might affect user experience, but also the user’s bandwidth to be able to exchange parameters of the machine learning model in real time.
Luckily, with the emergence of technologies such as 5G, today’s communications infrastructure is more than robust enough to handle this. Plus, the edge devices that technologies such as Google’s FLoC are typically talking to tend to be powerful mobile phones with several gigabytes of memory. This means that certain technical barriers to federated learning have been all but removed.
Moving computing to the edge can, in fact, be thought of as a positive for businesses. Federated learning spreads the load. Because the computation is being undertaken on powerful consumer devices, businesses don’t need to invest in as much costly central computing power as they otherwise would.
Plugging the gaps
Because federated learning enables multiple actors to build a common, robust ML model without sharing data, it addresses critical issues for ML, such as data security, data access rights, and access to heterogeneous data. Since the database is segmented into disparate parts held locally on devices and only learning parameters are exchanged, it makes it more difficult to hack. However, whilst it will undoubtedly become an important part of the modern marketing technology stack, federated learning must be implemented carefully. Even though such a technique is, by its very nature, a leap forward in data privacy, it is still imperative that privacy-by-design principles are observed at all times.
Privacy-by-design proactively embeds privacy into the design and operation of IT systems, networked infrastructure, and business practices. As such, it is important to keep its principles front and centre of your thinking. Only when federated learning is paired with other privacy mechanisms—such as secure multi-party computation, differential privacy and quantitative measurement—can privacy risks be considered addressed. With federated learning, therefore, it is a case of plugging the gaps to ensure you remain compliant with increasingly stringent privacy regulations.
Despite the aforementioned benefits of FLoC and federated learning, there is still a privacy question that should be discussed: “Should the model parameters that are transferred be considered as personal information, and where do we draw the line?” This surfaces a new layer of global privacy complexities that would need to be navigated since some laws prevent personal data from leaving the jurisdiction, which makes privacy-by-design all the more important.
A challenge of the digital era
Legacy ML techniques have, and do, create privacy and security issues. Consumers are increasingly resistant to having data analysed that has been removed from their devices, and there will always be re-identification risks associated with that data. Federated learning however, reduces this risk because the data is never in transit and always remains on the consumer’s device. Quite simply, if your phone dies, your data dies with it.
As the global wave of privacy legislation and privacy activism continues to accelerate the need for privacy-preserving techniques, federated learning is one worthy example of the progress that is being made to mould a data-led economy that is underpinned by privacy. The adoption of privacy-enhancing technologies is not only transforming the way businesses approach challenges with compliance, but also the way they overcome operational inefficiencies and accelerate data-driven strategies or further evolve AI initiatives. Rather than looking at data privacy as a business blocker, those embracing a privacy-by-design ethos are understanding that protecting privacy is a gateway to a greater depth of insights that can fuel growth and power innovation.