18
Sep

Hrishikesh Thite is a second year management student at the Indian Institute of Management Calcutta. This unabridged article was originally written as a term paper for a course on Analysis of ICT Markets. It is fairly long, so consider yourself forewarned. There is a summary of sorts, at the end, for those who want to briefly visit the gist of the paper.


Introduction

It is said that the best security is provided by a system that has no access. At all. Unfortunately, such as system would hardly be usable by anyone. There has always been a trade-off between accessibility and security, and usability and privacy. With an increasing online presence and single large corporations owning or collaborating over multitude of services governing the always connected online lifestyle, individual and corporate users need to reevaluate the security and privacy angle. In this term paper, I will examine the policies of Google and its partners in the context of security, privacy and convenience.

There are broadly two areas to consider: The first is the service provider itself, which in this case is Google, and the second is the system that connects the service provider to the end-user, which would mean the Internet and the users’ local ISP etc. It could be argued that Google has no control over the network, but there are ways of securing the network, such as with the use of HTTPS, that needs to be adopted and implemented by the service provider. Yet another classification is to look at security from the perspective of raw data and physical access security, while the other involves looking at security from the logical or intellectual property perspective.

Google, Privacy & Security

Google states that[1]

  • It does not own any user data
  • It will not share it with any third-party
  • It will allow users to keep it as long as they want
  • It will allow users to extract data from its services

However, most of the above points raise follow-on questions. The first point is a no-brainer. Other providers, such as Facebook[2] and Flickr[3], have tried changing their policies on data ownership, which users vehemently opposed. Google, so far, has continued to use the saner ownership policy. But in a sense, Google does not require to “own” data for its business purposes (discussed in detail later). In the second point, Google refers to its privacy policy (examined later) when it says that it won’t share any data with third-parties. Plus, given the many services of Google itself across which data may be (automatically) shared, this author is not sure whether the term third-party even makes conventional sense. The third point about data-retention has an implicit meaning – Google will allow users to keep data on its services so long as: one, Google itself continues to exist, and two, Google finds it economically and financially appealing to allow users to use its service (for free, or almost free). Again, it could be argued that Google is sufficiently large and has deep enough pockets to survive almost anything thrown at it, or that the above would be true with any service provider, and has always been the most-debated point of software as a hosted service. The final point is an answer to the doubts posted above, in that Google promises that it will allows users to extract their data from its services at any point of time. It does not, however, explicitly state what happens once the data, or the account itself, is deleted. (It does say that the deletion will take five days, but not how and if they ensure complete and secure deletion.)

A typical Google account, combined with various Google services, is a wealthy data source for personal information including financial data such as stock portfolios and credit card details (for purchases via Froogle, for example), location and positioning information (via Latitude and Maps), users’ connections and social networks (via Gmail and Orkut), user passwords and login information (stored via Wand / AutoFill on the Google Toolbar) apart from the users’ clickstream and other behavioural data.

Employee Access, Abuse & Content Neutrality

Google goes on to say[4] that its employees cannot access user data unless explicit permission is granted, for purposes of troubleshooting, for example. It then also says that employees or automated systems may take down data that violates the terms of service. It is not clear to his author how they can do this without continuous monitoring. It appears that Google’s employees (or automated systems) monitor aggregate figures, such as spikes in bandwidth usage, or extreme changes in access patterns and then investigate those. They already share these statistics and trends with administrators and users of Google Analytics. The other scenario is the case where another user explicitly complains about another user or their data. In both the cases, their employees clearly access user data. Google claims[5] that it will contact the primary account administrator in the event content such as malware, pornography, child pornography, copyrighted or trademarked content is taken down. This is especially true on its online video sharing service, YouTube. Google is, thus, not content neutral.

Google encourages third parties to report abuse of its systems, as well as actively tries to detect violations. It will not ask users for an explanation, or inform them before deleting content, but as stated above, contact them after the deletion has taken place. This leaves the system open for abuse. It is not clear from the openly available data if a user can contest such deletion and / or termination.

Google Apps and its related services allow domain administrators (which should ideally belong to the same organization as the users) to access all end-user accounts and their associated data[6]. This author is not sure if corporate and other general-purpose users are aware of this, or if Google or the account administrator makes any effort to publicize this.

An interesting reverse case: A Google executive got sued[7] for a third-party posting of a video showing a disabled teen being harassed by peers. The exact charge is “defamation and failure to exercise control over personal data.” How is Google supposed to monitor its vast information repositories without violating its own privacy and security guidelines, and whether it is responsible for user generated content at all, are both debatable questions.

Google & the Law

While Google tries to function like a Swiss bank, untouchable and high-up-there, it also tries to play it safe and by the law (emphasis mine):

“Google does not share or reveal private user content such as email or personal information with third parties except as required by law, on request by a user or system administrator, or to protect our systems. These exceptions include requests by users that Google’s support staff access their email messages in order to diagnose problems; when Google is required by law to do so; and when we are compelled to disclose personal information because we reasonably believe it’s necessary in order to protect the rights, property or safety of Google, its users and the public.[8]

There are many issues with the above: Google reserves the right to determine if it is necessary to disclose personal information, without explicitly informing the user before such an incidence occurs. It puts this in its privacy policy, and because users have accepted the policy, they are deemed to have given away their right to protect their personal information. Google will give out any user information to a law enforcement agency or a government entity, leading to Big Brother scenarios common in the Western nations.

The only real protection in the light of the above scenarios is security by obscurity. If you are not an “interesting individual”, your data is safe. There are ways to secure your data and still continue to use Google services, such as using local encryption or an additional layer of password protection, but this will only protect the data itself. Meta-data from communications (to, from, time-data etc.) will still remain exposed. The meta-data is enough[9] for the agencies to sue you or arrest you and force you to provide them the access codes or passwords anyway.

Google is registered (via a $200 self-certification?) with the U.S. Safe Harbor whose privacy principles include Notice, Choice, Onward Transfer, Security, Data Integrity, Access and Enforcement[10]. But an organization must decide itself if Google is compatible with all the laws and regulations it may be subjected to. In other words, it passes the buck to the organization (or individual) using Google services.

Google’s abuse policy[11] goes on to state that:

•  As a provider of content creation tools and hosting services, Google is not in a position to mediate or adjudicate disputes between third parties. For matters involving trademarks or impersonation, we recommend that you raise your concerns directly with the creator of the content in question.

•  If you believe that someone is impersonating your identity for the purposes of fraud, we recommend that you contact law enforcement or consumer protection authorities.

•  If you choose to pursue legal action against the content owner, we are prepared to accept valid court orders concerning the content in question. To submit a valid court order, please contact us at apps-abuse-report@google.com.

The above shows that Google is trying to be content neutral, so long as the law allows it to be. This is contrary from what we saw in an earlier statement, where Google said that it can choose to delete user data.

Handling User Information

In its note about the way Google handles user information[12] (emphasis mine):

“We may combine the information you submit under your account with information from other Google services or third parties in order to provide you with a better experience and to improve the quality of our services. For certain services, we may give you the opportunity to opt out of combining such information.

We use cookies to improve the quality of our service, including for storing user preferences, improving search results and ad selection, and tracking user trends, such as how people search. Google also uses cookies in its advertising services to help advertisers and publishers serve and manage ads across the web.”

From another page: Data is scanned and indexed[13] for user’s own search, for spam filtering and virus detection and for displaying context sensitive ads. However, processes are automated, involve no human interaction, and data is not part of the general google.com index (unless information is explicitly published). In yet another case, a software issue resulted in some users’ documents in Google Docs being shared with their contacts without the users’ permission[14]. This may be unintentional, but is definitely possible.

The problem is compounded by combining information from various sources or “mashing”. Most online data is believed to exist in databases that by themselves do not contain full or meaningful information. However, in a bid to improve and integrate user experience and the quality of its services, Google routinely brings all of this information together and delivers it via its many offerings. It also actively tracks this information, though it claims it does this only at an aggregate level. However, some of its services such as the Web History and Search History delivered via the Google Toolbar will store the users’ click-stream in extreme detail with a Google Account. It does this with the users’ consent and it is generally understood that the user opted to provide these details, but does the user really comprehend the implications of this blatant and unrestricted information sharing?

Third Party Add-ons & Cross-Site Sharing

Google publishes an Application Programming Interface (API) that allows third party (external) developers to create applications that use Google services, and possibly user data, with the idea that this will further add value to the overall offering. Google does not have an explicit program for certifying these application developers. It does not guarantee that all of these add-ons, widgets or apps are free from malicious intent. Postini[15] is a reputed service that can be added to Google Apps in order to add email message security, discovery, archival and recovery. It allows companies to comply with regulations such as the Sarbanes-Oxley Act. However, it also exposes user data to a completely different entity, even though the end-user feels and for all practical purposes believes that she is interacting only with Google. Facebook and Orkut third-party applications, for example, get access to your entire social network once installed. Aggregators and API publishers such as Google make it extremely easy to install such add-ons. There is simply no control over what such an application can do; it is largely based on “good old trust” and “good faith efforts”, more so if it is closed source and unapproved.

Google does not use web beacons. However, they present an interesting case. Web beacons are small one-pixel wide images that allow sites to better understand the traffic patterns within their (and network-affiliated) domains and, subsequently, adjust their content to better respond to their visitors’ interests. By mining “conversations” between the web and ad servers and the users’ browser, advertisers can tune their marketing campaigns for optimal effectiveness. Google’s cookies and partner network’s web beacons are cross-site and cross-browser, some even work with rich-email clients and RSS feed readers. This means that Google’s partners and Google can collaborate to serve advertisements to users and collect an even larger chunk of user information, by observing user behavior over a much larger spectrum of users’ online life. The undesirable side-effect, of course, is that they also uniquely identify a user, and can easily link all online activity to an email or user account.

An even bigger point worth discussing is: Where do we draw the line? A user will receive the best possible service if everything about the user is known. This also means that everything about the user is known! And not necessarily known by a trustworthy entity. Google may be trusted as an entity (primarily because it hasn’t done something bad publicly, so far), but not everyone connected with Google can be trusted. Google has grown to such a size and has so many ways to gather information, be it the World Wide Web (Search), the computer (Google Desktop), cloud computing (Google Apps, Gmail) or the mobile[16] (Latitude, Maps), that is has become almost all pervasive. A single entity with this amount of information, concentrated and ready for use in any desired form, is an extremely useful but risky proposition.

It begs the question: If advertisements can be tailored and made better (at least in theory), is this not a good idea? Imagine advertisements that you really want to see. This is exactly the premise that Google works on. And the problem with this is that there is no guarantee that all the information collected and mined for the advertising platform will not be used for other purposes, possibly leading to discrimination, or abuse. It is widely known that yield and revenue management systems such as airline reservations result in discrimination or unintended results.

Opting-Out & Other Initiatives

Google provides mechanisms for users to opt-out of this information collection and allows users to provide consent (or opt-in) wherever possible – not exactly the best of policies. It is a member of the Network Advertising Initiative[17] (NAI) which provides an effective way for users to opt-out of behavioural advertising[18]. Unfortunately, the way this works is that NAI puts out a special cookie on your machine that tells other advertisers that “this user would prefer not to be tracked. Please abide by the rules!” It does nothing to impose the rules, and advertising members are expected to follow the honour code.

The Fair Information Practice and NAI’s guidelines on notice and choice for Web Beacons state that:

  1. Any use of Web Beacons, whether through a website or email, requires notice.
  2. Notice must include a disclosure that Web Beacons are being used; the purpose for which the Web Beacons are being used; and, if applicable, a disclosure of any transfer of data to third parties.
  3. Organizations that use Web Beacons to transfer Personally Identifiable Information to a Third Party, for purposes unrelated to the reason for which the Personally Identifiable Information was initially collected, must provide choice for such transfers.
  4. Organizations that use Web Beacons to transfer sensitive information associated with Personally Identifiable Information to a Third Party must obtain explicit consent (opt-in) for such transfers.

The above can be generalized for any of the various tracking mechanisms used online.

Initiatives such as the Platform for Privacy Preferences (P3P) define sensitive information as forms of data that would normally be private – including certain types of health, financial, sexual and political data. They define how to collect Personally Identifiable Information (PII), how to inform users that such information is being collected, and how to provide them with a choice. However, the issue with this choice is that it often is provided in binary: Give us all the information we want, or don’t use the service. This is not the right way to proceed. Unless organizations are able to justify why they absolutely need information to provide the service they are providing, and the user accepts that the need is genuine, there should be no obligation on the user to provide the data. Then again, aggregation services that are based on crowd-sourcing or implicit information gathering systems will fail if the users don’t understand or consent to provide these services the required data. In the end, user communities will have to decide if they value their privacy more than the usability.

The World Privacy Forum[19] (WPF) expressed its concerns and questions over the possible shift of City governments of Washington D.C. and Los Angeles in the US to cloud-computing services offered by Google. The letter concludes with the very valid concern:

“Our concern is that the transfer of so many City records to a cloud computing provider may threaten the privacy rights of City residents, undermine the security of other sensitive information, violate both state and federal laws, and potentially damage vital City legal and other interests.”

An article[20] titled “Google Apps: Are privacy and security concerns being misplaced by the media?” provides excellent commentary. There is a valid argument that in most cases, a service provider such as Google is far more capable of handling security and addressing privacy concerns than an ill-funded or wrongly-administered local project. However, local projects ensure separation of data and systems. Centralized cloud-computing services will logically concentrate data from many such “local” projects – a cause for concern.

Handling Changes

Worldwide, organizations are increasingly being acquired and merged with other entities, more so in the technology business. This means that users and service providers need to handle changes to privacy policies and practices. Google, to its credit, has a reasonably strong “changes to privacy policy”[21] clause:

  • No reduction in privacy without explicit consent
  • More prominent notice in case of significant changes
  • Comparison and review of prior versions possible

This provides reasonable assurance to end-users’ that Google’s unwritten policy of “Do-No-Evil” is still high up on the priority list. Except that if Google does indeed change to a more restrictive privacy policy, and people want to opt-out, which in this case could mean stop using Google’s services, then will Google be able to guarantee that all user data and related information will be deleted from not only its data warehouses, but also that of its partners and affiliates? A lot of organizations, including Google let this question slide; “we’ll tackle this problem when it presents itself, if it ever does.”

The Last Mile

Google itself undergoes SAS70 Type II audits[22] covering the following for its own IT infrastructure.

  • Logical security
  • Privacy
  • Data center physical security
  • Incident management and availability
  • Change management
  • Organization and administration

While audits by themselves are not the best ways of guaranteeing security, it is hoped that they are performed in the right spirit and Google therefore has reasonable secure physical server farms, and that no random entity can easily compromise the system with or without malicious intent.

However, there is little, if any, awareness in the last mile of connectivity between the service provider and the end-user[23] and the end-users’ computer and communication systems. Google, by default, uses the Hyper Text Transfer Protocol (HTTP) instead of HTTP-S (secure) for all its services by default (except when logging in). This makes it extremely easy for a packet sniffer, even a freely available out-in-the-wild variety, to access and compromise user data. The use of cross-site cookies that authenticate all Google services via a single login can also be easily hacked. Users don’t realize that they need to explicitly sign-out of services in order to end their session. Google does not make it easy and obvious for them to realize this either.

Because Google is the largest cloud-service provider (and also the largest beneficiary), it should educate the users about security options and make these options explicitly available in easy-to-find places. Google should further refrain from misleading marketing practices, such as issuing statements like “most secure and safe email” without educating users about the risks and loss of rights in using a hosted email service. Google should default to highest security (such as always using encrypted sessions) and allow people to explicitly opt-out of these measures in case they understand the risks and choose so themselves; some extremists go to the extent of saying that Google should refuse access to compromised systems or systems with low security.

To its defense, Google says that the performance impact and the costs of additional security and privacy are not justified. This is, of course, largely contested. Google also states that it uses special purpose technology as opposed to general-purpose software, which exposes only the required and necessary services while switching others off, making it more resilient to general internet hacking and most virus attacks that comparable service providers. It also claims to have a team monitoring its servers round-the-clock, developing and deploying patches in case a vulnerability is detected.

Summary

Online security and privacy is (or should be, given the lack of user awareness) an important issue for any user. Service providers, and especially Google, given its size and all-pervasive offerings, are pushing users to increasingly depend on their services, so much so, that it is often not possible to not use a service such as Google Search. These providers are collecting user information in many ways in order to legitimately improve their service offerings, and provide more customized and personalized content, including advertisements. However, the ease of data sharing, the ease of overlaying and the monopolies are resulting in concerns over the security of the data and privacy of personal information.

There is no simple answer over whether service providers such as Google can be trusted (“Yes”), or completely ignored (“No”). Users by themselves are not capable of making the choice. Consumer watchdogs and monitoring agencies are overwhelmed and often subverted in subtle means, more so as technology evolves and matures. In the end, it appears that the vision and strategy of the service provider is what keeps it from being drawn into the black hole of privacy and security violation. Of course, the concern of a third-party using data aggregated by a service provider without its (and its users’) permission and knowledge is a very valid one. The provider itself may have its users’ best interests at heart, but the same cannot be said of its employees or its partners; ultimately everyone has a price, and the provider is only as “clean” as its employees.

Is there a way out for the average user? This author does not believe that a simple straight-forward way exists, at least for now.


[1] http://www.google.com/support/a/bin/answer.py?answer=106876 – Most of the Google statements are taken from their Privacy Policy, Terms and Conditions or from their Help. Where possible, the URL will be quoted.

[2] http://www.brandrepublic.com/News/881894/Facebook-rethinks-data-ownership-change-users-revolt/ – Facebook claimed ownership of uploaded content even after user profiles had been deleted. This granted Facebook lifelong ownership rights to user photos, videos, written content and music.

[3] http://www.wired.com/techbiz/media/news/2005/08/68654 – Flickr required users to create a Yahoo! id and forced them to accept changed terms after it was acquired for an undisclosed sum by Yahoo!

[4] http://www.google.com/support/a/bin/answer.py?answer=106887

[5] http://www.google.com/support/a/bin/answer.py?answer=112448

[6] http://www.google.com/support/a/bin/answer.py?answer=112441

[7] http://www.infoworld.com/d/security-central/google-privacy-exec-faces-criminal-charges-in-italy-345

[8] http://www.google.com/support/a/bin/answer.py?answer=107807

[9] In fact the Patriot Act in the United States specifically grants this right to government and law-enforcing agencies.

[10] http://www.google.com/support/a/bin/answer.py?answer=107819

[11] http://www.google.com/support/a/bin/answer.py?hl=en&answer=134413

[12] http://www.google.com/privacypolicy.html

[13] http://www.google.com/support/a/bin/answer.py?answer=107810

[14] http://www.techcrunch.com/2009/03/07/huge-google-privacy-blunder-shares-your-docs-without-permission/

[15] http://www.google.com/support/a/bin/answer.py?answer=112445

[16] http://www.privacyinternational.org/article.shtml?cmd[347]=x-347-563567 – has an excellent review of the privacy and security flaws with the Google Latitude offering

[17] NAI’s mission:

•  To provide information and a mechanism for consumers to monitor and control their online experience.

•   To provide a platform for the development of standards and policies that reward responsible marketing and the responsible use of data, as well as promote the long-term growth and viability of the Internet as a vibrant marketing channel.

[18] http://www.networkadvertising.org/managing/opt_out.asp

[19] http://clkrep.lacity.org/onlinedocs/2009/09-1714_misc_7-16-09.pdf

[20] http://www.thetechherald.com/article.php/200933/4215/Google-Apps-Are-privacy-and-security-concerns-being-misplaced-by-the-media

[21] http://www.google.com/privacypolicy.html

[22] http://www.google.com/support/a/bin/answer.py?answer=138340

[23] http://www.wired.com/images_blogs/threatlevel/2009/06/google-letter-final2.pdf – Adopted from the “Six Page Letter to Google’s CEO”, signed by academicians

If you're new here, you may want to subscribe to my RSS feed. Thanks for visiting!

Related posts:

  1. Google Guruji
  2. Phones and Location Based Services!
  3. The ‘Buzz’ about Product launches & the tradeoffs therein
  4. How Google has always managed to be on top
  5. Google v/s Microsoft: The Showdown Continues…….

Category : Business / Public Issues / Technology / Web