Business, Public Issues, Technology, Web

An Overview of Online Privacy and Security (With a Focus on Google)

Hrishikesh Thite is a second year management student at the Indian Institute of Management Calcutta. This unabridged article was originally written as a term paper for a course on Analysis of ICT Markets. It is fairly long, so consider yourself forewarned. There is a summary of sorts, at the end, for those who want to briefly visit the gist of the paper.


It is said that the best security is provided by a system that has no access. At all. Unfortunately, such as system would hardly be usable by anyone. There has always been a trade-off between accessibility and security, and usability and privacy. With an increasing online presence and single large corporations owning or collaborating over multitude of services governing the always connected online lifestyle, individual and corporate users need to reevaluate the security and privacy angle. In this term paper, I will examine the policies of Google and its partners in the context of security, privacy and convenience.

There are broadly two areas to consider: The first is the service provider itself, which in this case is Google, and the second is the system that connects the service provider to the end-user, which would mean the Internet and the users’ local ISP etc. It could be argued that Google has no control over the network, but there are ways of securing the network, such as with the use of HTTPS, that needs to be adopted and implemented by the service provider. Yet another classification is to look at security from the perspective of raw data and physical access security, while the other involves looking at security from the logical or intellectual property perspective.

Google, Privacy & Security

Google states that[1]

  • It does not own any user data
  • It will not share it with any third-party
  • It will allow users to keep it as long as they want
  • It will allow users to extract data from its services

However, most of the above points raise follow-on questions. The first point is a no-brainer. Other providers, such as Facebook[2] and Flickr[3], have tried changing their policies on data ownership, which users vehemently opposed. Google, so far, has continued to use the saner ownership policy. But in a sense, Google does not require to “own” data for its business purposes (discussed in detail later). In the second point, Google refers to its privacy policy (examined later) when it says that it won’t share any data with third-parties. Plus, given the many services of Google itself across which data may be (automatically) shared, this author is not sure whether the term third-party even makes conventional sense. The third point about data-retention has an implicit meaning – Google will allow users to keep data on its services so long as: one, Google itself continues to exist, and two, Google finds it economically and financially appealing to allow users to use its service (for free, or almost free). Again, it could be argued that Google is sufficiently large and has deep enough pockets to survive almost anything thrown at it, or that the above would be true with any service provider, and has always been the most-debated point of software as a hosted service. The final point is an answer to the doubts posted above, in that Google promises that it will allows users to extract their data from its services at any point of time. It does not, however, explicitly state what happens once the data, or the account itself, is deleted. (It does say that the deletion will take five days, but not how and if they ensure complete and secure deletion.)

A typical Google account, combined with various Google services, is a wealthy data source for personal information including financial data such as stock portfolios and credit card details (for purchases via Froogle, for example), location and positioning information (via Latitude and Maps), users’ connections and social networks (via Gmail and Orkut), user passwords and login information (stored via Wand / AutoFill on the Google Toolbar) apart from the users’ clickstream and other behavioural data.

Employee Access, Abuse & Content Neutrality

Google goes on to say[4] that its employees cannot access user data unless explicit permission is granted, for purposes of troubleshooting, for example. It then also says that employees or automated systems may take down data that violates the terms of service. It is not clear to his author how they can do this without continuous monitoring. It appears that Google’s employees (or automated systems) monitor aggregate figures, such as spikes in bandwidth usage, or extreme changes in access patterns and then investigate those. They already share these statistics and trends with administrators and users of Google Analytics. The other scenario is the case where another user explicitly complains about another user or their data. In both the cases, their employees clearly access user data. Google claims[5] that it will contact the primary account administrator in the event content such as malware, pornography, child pornography, copyrighted or trademarked content is taken down. This is especially true on its online video sharing service, YouTube. Google is, thus, not content neutral.

Google encourages third parties to report abuse of its systems, as well as actively tries to detect violations. It will not ask users for an explanation, or inform them before deleting content, but as stated above, contact them after the deletion has taken place. This leaves the system open for abuse. It is not clear from the openly available data if a user can contest such deletion and / or termination.

Google Apps and its related services allow domain administrators (which should ideally belong to the same organization as the users) to access all end-user accounts and their associated data[6]. This author is not sure if corporate and other general-purpose users are aware of this, or if Google or the account administrator makes any effort to publicize this.

An interesting reverse case: A Google executive got sued[7] for a third-party posting of a video showing a disabled teen being harassed by peers. The exact charge is “defamation and failure to exercise control over personal data.” How is Google supposed to monitor its vast information repositories without violating its own privacy and security guidelines, and whether it is responsible for user generated content at all, are both debatable questions.

Google & the Law

While Google tries to function like a Swiss bank, untouchable and high-up-there, it also tries to play it safe and by the law (emphasis mine):

“Google does not share or reveal private user content such as email or personal information with third parties except as required by law, on request by a user or system administrator, or to protect our systems. These exceptions include requests by users that Google’s support staff access their email messages in order to diagnose problems; when Google is required by law to do so; and when we are compelled to disclose personal information because we reasonably believe it’s necessary in order to protect the rights, property or safety of Google, its users and the public.[8]

There are many issues with the above: Google reserves the right to determine if it is necessary to disclose personal information, without explicitly informing the user before such an incidence occurs. It puts this in its privacy policy, and because users have accepted the policy, they are deemed to have given away their right to protect their personal information. Google will give out any user information to a law enforcement agency or a government entity, leading to Big Brother scenarios common in the Western nations.

The only real protection in the light of the above scenarios is security by obscurity. If you are not an “interesting individual”, your data is safe. There are ways to secure your data and still continue to use Google services, such as using local encryption or an additional layer of password protection, but this will only protect the data itself. Meta-data from communications (to, from, time-data etc.) will still remain exposed. The meta-data is enough[9] for the agencies to sue you or arrest you and force you to provide them the access codes or passwords anyway.

Google is registered (via a $200 self-certification?) with the U.S. Safe Harbor whose privacy principles include Notice, Choice, Onward Transfer, Security, Data Integrity, Access and Enforcement[10]. But an organization must decide itself if Google is compatible with all the laws and regulations it may be subjected to. In other words, it passes the buck to the organization (or individual) using Google services.

Google’s abuse policy[11] goes on to state that:

•  As a provider of content creation tools and hosting services, Google is not in a position to mediate or adjudicate disputes between third parties. For matters involving trademarks or impersonation, we recommend that you raise your concerns directly with the creator of the content in question.

•  If you believe that someone is impersonating your identity for the purposes of fraud, we recommend that you contact law enforcement or consumer protection authorities.

•  If you choose to pursue legal action against the content owner, we are prepared to accept valid court orders concerning the content in question. To submit a valid court order, please contact us at

The above shows that Google is trying to be content neutral, so long as the law allows it to be. This is contrary from what we saw in an earlier statement, where Google said that it can choose to delete user data.

Handling User Information

In its note about the way Google handles user information[12] (emphasis mine):

“We may combine the information you submit under your account with information from other Google services or third parties in order to provide you with a better experience and to improve the quality of our services. For certain services, we may give you the opportunity to opt out of combining such information.

We use cookies to improve the quality of our service, including for storing user preferences, improving search results and ad selection, and tracking user trends, such as how people search. Google also uses cookies in its advertising services to help advertisers and publishers serve and manage ads across the web.”

From another page: Data is scanned and indexed[13] for user’s own search, for spam filtering and virus detection and for displaying context sensitive ads. However, processes are automated, involve no human interaction, and data is not part of the general index (unless information is explicitly published). In yet another case, a software issue resulted in some users’ documents in Google Docs being shared with their contacts without the users’ permission[14]. This may be unintentional, but is definitely possible.

The problem is compounded by combining information from various sources or “mashing”. Most online data is believed to exist in databases that by themselves do not contain full or meaningful information. However, in a bid to improve and integrate user experience and the quality of its services, Google routinely brings all of this information together and delivers it via its many offerings. It also actively tracks this information, though it claims it does this only at an aggregate level. However, some of its services such as the Web History and Search History delivered via the Google Toolbar will store the users’ click-stream in extreme detail with a Google Account. It does this with the users’ consent and it is generally understood that the user opted to provide these details, but does the user really comprehend the implications of this blatant and unrestricted information sharing?

Third Party Add-ons & Cross-Site Sharing

Google publishes an Application Programming Interface (API) that allows third party (external) developers to create applications that use Google services, and possibly user data, with the idea that this will further add value to the overall offering. Google does not have an explicit program for certifying these application developers. It does not guarantee that all of these add-ons, widgets or apps are free from malicious intent. Postini[15] is a reputed service that can be added to Google Apps in order to add email message security, discovery, archival and recovery. It allows companies to comply with regulations such as the Sarbanes-Oxley Act. However, it also exposes user data to a completely different entity, even though the end-user feels and for all practical purposes believes that she is interacting only with Google. Facebook and Orkut third-party applications, for example, get access to your entire social network once installed. Aggregators and API publishers such as Google make it extremely easy to install such add-ons. There is simply no control over what such an application can do; it is largely based on “good old trust” and “good faith efforts”, more so if it is closed source and unapproved.

Google does not use web beacons. However, they present an interesting case. Web beacons are small one-pixel wide images that allow sites to better understand the traffic patterns within their (and network-affiliated) domains and, subsequently, adjust their content to better respond to their visitors’ interests. By mining “conversations” between the web and ad servers and the users’ browser, advertisers can tune their marketing campaigns for optimal effectiveness. Google’s cookies and partner network’s web beacons are cross-site and cross-browser, some even work with rich-email clients and RSS feed readers. This means that Google’s partners and Google can collaborate to serve advertisements to users and collect an even larger chunk of user information, by observing user behavior over a much larger spectrum of users’ online life. The undesirable side-effect, of course, is that they also uniquely identify a user, and can easily link all online activity to an email or user account.

An even bigger point worth discussing is: Where do we draw the line? A user will receive the best possible service if everything about the user is known. This also means that everything about the user is known! And not necessarily known by a trustworthy entity. Google may be trusted as an entity (primarily because it hasn’t done something bad publicly, so far), but not everyone connected with Google can be trusted. Google has grown to such a size and has so many ways to gather information, be it the World Wide Web (Search), the computer (Google Desktop), cloud computing (Google Apps, Gmail) or the mobile[16] (Latitude, Maps), that is has become almost all pervasive. A single entity with this amount of information, concentrated and ready for use in any desired form, is an extremely useful but risky proposition.

It begs the question: If advertisements can be tailored and made better (at least in theory), is this not a good idea? Imagine advertisements that you really want to see. This is exactly the premise that Google works on. And the problem with this is that there is no guarantee that all the information collected and mined for the advertising platform will not be used for other purposes, possibly leading to discrimination, or abuse. It is widely known that yield and revenue management systems such as airline reservations result in discrimination or unintended results.

Opting-Out & Other Initiatives

Google provides mechanisms for users to opt-out of this information collection and allows users to provide consent (or opt-in) wherever possible – not exactly the best of policies. It is a member of the Network Advertising Initiative[17] (NAI) which provides an effective way for users to opt-out of behavioural advertising[18]. Unfortunately, the way this works is that NAI puts out a special cookie on your machine that tells other advertisers that “this user would prefer not to be tracked. Please abide by the rules!” It does nothing to impose the rules, and advertising members are expected to follow the honour code.

The Fair Information Practice and NAI’s guidelines on notice and choice for Web Beacons state that:

  1. Any use of Web Beacons, whether through a website or email, requires notice.
  2. Notice must include a disclosure that Web Beacons are being used; the purpose for which the Web Beacons are being used; and, if applicable, a disclosure of any transfer of data to third parties.
  3. Organizations that use Web Beacons to transfer Personally Identifiable Information to a Third Party, for purposes unrelated to the reason for which the Personally Identifiable Information was initially collected, must provide choice for such transfers.
  4. Organizations that use Web Beacons to transfer sensitive information associated with Personally Identifiable Information to a Third Party must obtain explicit consent (opt-in) for such transfers.

The above can be generalized for any of the various tracking mechanisms used online.

Initiatives such as the Platform for Privacy Preferences (P3P) define sensitive information as forms of data that would normally be private – including certain types of health, financial, sexual and political data. They define how to collect Personally Identifiable Information (PII), how to inform users that such information is being collected, and how to provide them with a choice. However, the issue with this choice is that it often is provided in binary: Give us all the information we want, or don’t use the service. This is not the right way to proceed. Unless organizations are able to justify why they absolutely need information to provide the service they are providing, and the user accepts that the need is genuine, there should be no obligation on the user to provide the data. Then again, aggregation services that are based on crowd-sourcing or implicit information gathering systems will fail if the users don’t understand or consent to provide these services the required data. In the end, user communities will have to decide if they value their privacy more than the usability.

The World Privacy Forum[19] (WPF) expressed its concerns and questions over the possible shift of City governments of Washington D.C. and Los Angeles in the US to cloud-computing services offered by Google. The letter concludes with the very valid concern:

“Our concern is that the transfer of so many City records to a cloud computing provider may threaten the privacy rights of City residents, undermine the security of other sensitive information, violate both state and federal laws, and potentially damage vital City legal and other interests.”

An article[20] titled “Google Apps: Are privacy and security concerns being misplaced by the media?” provides excellent commentary. There is a valid argument that in most cases, a service provider such as Google is far more capable of handling security and addressing privacy concerns than an ill-funded or wrongly-administered local project. However, local projects ensure separation of data and systems. Centralized cloud-computing services will logically concentrate data from many such “local” projects – a cause for concern.

Handling Changes

Worldwide, organizations are increasingly being acquired and merged with other entities, more so in the technology business. This means that users and service providers need to handle changes to privacy policies and practices. Google, to its credit, has a reasonably strong “changes to privacy policy”[21] clause:

  • No reduction in privacy without explicit consent
  • More prominent notice in case of significant changes
  • Comparison and review of prior versions possible

This provides reasonable assurance to end-users’ that Google’s unwritten policy of “Do-No-Evil” is still high up on the priority list. Except that if Google does indeed change to a more restrictive privacy policy, and people want to opt-out, which in this case could mean stop using Google’s services, then will Google be able to guarantee that all user data and related information will be deleted from not only its data warehouses, but also that of its partners and affiliates? A lot of organizations, including Google let this question slide; “we’ll tackle this problem when it presents itself, if it ever does.”

The Last Mile

Google itself undergoes SAS70 Type II audits[22] covering the following for its own IT infrastructure.

  • Logical security
  • Privacy
  • Data center physical security
  • Incident management and availability
  • Change management
  • Organization and administration

While audits by themselves are not the best ways of guaranteeing security, it is hoped that they are performed in the right spirit and Google therefore has reasonable secure physical server farms, and that no random entity can easily compromise the system with or without malicious intent.

However, there is little, if any, awareness in the last mile of connectivity between the service provider and the end-user[23] and the end-users’ computer and communication systems. Google, by default, uses the Hyper Text Transfer Protocol (HTTP) instead of HTTP-S (secure) for all its services by default (except when logging in). This makes it extremely easy for a packet sniffer, even a freely available out-in-the-wild variety, to access and compromise user data. The use of cross-site cookies that authenticate all Google services via a single login can also be easily hacked. Users don’t realize that they need to explicitly sign-out of services in order to end their session. Google does not make it easy and obvious for them to realize this either.

Because Google is the largest cloud-service provider (and also the largest beneficiary), it should educate the users about security options and make these options explicitly available in easy-to-find places. Google should further refrain from misleading marketing practices, such as issuing statements like “most secure and safe email” without educating users about the risks and loss of rights in using a hosted email service. Google should default to highest security (such as always using encrypted sessions) and allow people to explicitly opt-out of these measures in case they understand the risks and choose so themselves; some extremists go to the extent of saying that Google should refuse access to compromised systems or systems with low security.

To its defense, Google says that the performance impact and the costs of additional security and privacy are not justified. This is, of course, largely contested. Google also states that it uses special purpose technology as opposed to general-purpose software, which exposes only the required and necessary services while switching others off, making it more resilient to general internet hacking and most virus attacks that comparable service providers. It also claims to have a team monitoring its servers round-the-clock, developing and deploying patches in case a vulnerability is detected.


Online security and privacy is (or should be, given the lack of user awareness) an important issue for any user. Service providers, and especially Google, given its size and all-pervasive offerings, are pushing users to increasingly depend on their services, so much so, that it is often not possible to not use a service such as Google Search. These providers are collecting user information in many ways in order to legitimately improve their service offerings, and provide more customized and personalized content, including advertisements. However, the ease of data sharing, the ease of overlaying and the monopolies are resulting in concerns over the security of the data and privacy of personal information.

There is no simple answer over whether service providers such as Google can be trusted (“Yes”), or completely ignored (“No”). Users by themselves are not capable of making the choice. Consumer watchdogs and monitoring agencies are overwhelmed and often subverted in subtle means, more so as technology evolves and matures. In the end, it appears that the vision and strategy of the service provider is what keeps it from being drawn into the black hole of privacy and security violation. Of course, the concern of a third-party using data aggregated by a service provider without its (and its users’) permission and knowledge is a very valid one. The provider itself may have its users’ best interests at heart, but the same cannot be said of its employees or its partners; ultimately everyone has a price, and the provider is only as “clean” as its employees.

Is there a way out for the average user? This author does not believe that a simple straight-forward way exists, at least for now.

[1] – Most of the Google statements are taken from their Privacy Policy, Terms and Conditions or from their Help. Where possible, the URL will be quoted.

[2] – Facebook claimed ownership of uploaded content even after user profiles had been deleted. This granted Facebook lifelong ownership rights to user photos, videos, written content and music.

[3] – Flickr required users to create a Yahoo! id and forced them to accept changed terms after it was acquired for an undisclosed sum by Yahoo!






[9] In fact the Patriot Act in the United States specifically grants this right to government and law-enforcing agencies.







[16][347]=x-347-563567 – has an excellent review of the privacy and security flaws with the Google Latitude offering

[17] NAI’s mission:

•  To provide information and a mechanism for consumers to monitor and control their online experience.

•   To provide a platform for the development of standards and policies that reward responsible marketing and the responsible use of data, as well as promote the long-term growth and viability of the Internet as a vibrant marketing channel.






[23] – Adopted from the “Six Page Letter to Google’s CEO”, signed by academicians

Like what you read? Share in your network!
Marketing, Strategy, Web

Why Amazon Is Yet To Capture The Indian Market..and Mindset

Having reached the USA, one of the most visible things i saw in the first month here, was how much everyone relied on Amazon.
Professors buy books every week from Amazon.
Students buy books from Amazon.
Students buy used books from Amazon.

People shop on Amazon. For virtually all of their needs.

I was tempted to think of why its not the same in India. Why do we rarely hear students or professionals talk of buying from Amazon.
Ofcourse there is lack of widespread internet connectivity. A large fraction of the population still doesnt have access to a computer let alone the internet.

But even in urban areas, cities and universities where internet connectivity is at an all time high, ecommerce of the scale that Amazon promises…is yet to take off. Some reasons:

1) Large scale proliferation of used book vendors on the streets.
2) Lack of secure or trusted or reliable ecommerce payment channels.
3) Slow traction from partner banks in providing support to ecommerce payment products
4) Apathy to books: fundamentally, the Indian community has been built on a system of rote learning in the classroom. Students are happy to read their class-notes, and write exams. They get good grades. And they are happy. There is a lack of appetite for true and deep knowledge. And this shows in the lack of interest to actually search and buy books.
5) Lack of Indian retailers: Few bookshops in india have a full fledged web commerce presence. So most customers have to have the books/items shipped from global warehouses [ i am guessing that most r in usa]. This adds to high cost. So people would rather buy the books themselves.
6) Sharing: Indian mindsets are fundamentally built on the culture of sharing. Students share books/materials. And encourage reuse of books/materials by handing them over to their juniors once they use them. There is hardly any reaction to the editions getting old. Students just like to get the crux of the book.

Hmmm..something for Amazon to think about!
Over here in the usa, its A-Z and Amazon!!


Like what you read? Share in your network!
IIM, Planning, Public Issues, Strategy, Web

Why CAT should always be offline

Being a web enthusiast, I never thought I would ever write a post saying why something shouldn’t go online.

For starters, CAT is NOT like GMAT. CAT is a relative scoring examination. People gain admission in colleges because they perform relatively better than others in the CAT exam. GMAT scores are always used on a very different plane – you only use them to make an approximate cut off for selection, more like a health score, but nothing more. You would rarely find someone missing out on admission of a college that considers GMAT scores because of having just missed the score. And that’s what CAT is all about. You can actually miss getting a call by as less as 0.01 percentile.
By spacing CAT over 10 days, the very beauty of CAT is being spoilt. It is supposed to be an exam that lakhs of people take at one go, and based on their relative performance, a set of people are selected for interviews. By spacing the exam over 10 days, and choosing different questions from a question bank of 3000, the complete concept of relative scoring is gone.

Here are the problems I see.
1. Problem of different question papers : How is it fair to have different question papers for different people when they are finally going to be treated alike?
2. Ok so let’s say that the CAT Admission team says that we are going to ensure we give a proportional quantity of each type of questions (Easy, Medium, Hard, and Vague as they normally do) in each paper. So after day 1, you will have 100 people from TIME and IMS advertising on their sites about the number of questions of each type that a candidate can expect. Won’t this give a student a different amount of information (which could be either beneficial or harmful to the candidate’s performance)?
3. The biggest question that exists still is – When there were so many issues with 4 different sequence of questions on CAT 2006, with people all over India saying it is unfair, how will differences be matched now?
4. Say no errors come in this online version. Then will we have percentiles for deciding selection criteria? How will they normalize the scores? Or will it be per day percentiles? Per day percentiles do not make sense as then the question would be that maybe all the smart people took the test on the same day, and hence allocation of people would be an issue.

All these problems come up because we still do not have enough infrastructure, as was pointed out by Shubham earlier. Until computer operations scale up in India to such an extent that both computer availability and online security can be made effective, I think offline CAT makes most sense.

Like what you read? Share in your network!
IIM, Marketing, Planning, Strategy, Web

CAT goes online!

The news of Common Admission Test (CAT) going online was doing the rounds since the past couple of years. This year during the convocation at IIM Calcutta it was announced by the Director Prof. Shekhar Chaudhary. To best explain CAT I quote from Wikipedia

The Common Admission Test (CAT) is an all-India test conducted by the Indian Institutes of Management (IIMs) as an entrance exam for the management programmes of its seven business schools. About 250,000 students took CAT in 2008 for about 1500 seats in the IIMs. This is said to make the IIMs more selective than the Ivy League Universities.

Over the years the number of students giving Common Admission Test (CAT) has steadily increased. The situation has reached a level where almost every graduate in the country fills the CAT Admission form and gives the test, because the exam is a multiple choice questions based and a little bit of luck can easily get you a high percentile. Almost 300,000 students gave the exam last year and this number keeps on increasing.

Thus Indian Institutes of Management (IIMs) have taken the bold step to enter the new territory of taking CAT online. How does this affect all of us? Well for all the aspirants it means a big change!

  • CAT Test Fee will probably skyrocket! Prometric ETS has been chosen as the partner to conduct the exam. ETS conducts GRE / GMAT / TOEFL exams in India and charges more than $100 for each exam. CAT exam cost was Rs 1,300 last year.
  • Preparation Costs will go up! As an ET rightly points out, all the coaching institutes will install new exam training systems and this extra cost will be passed on to the students.
  • Exam Patterns will change! Reading Comprehension becomes tougher sitting in front of a computer and thus it will need to be changed appropriately. Data Interpretation and Quant portions will need to be changed as well to facilitate the computer based test.
  • Online CAT Training should increase – with the whole exam going online, more and more ventures to provide online training will be around.

Even though I am sure the CAT Committee will find a good solution to the problem but I am quite amazed at how will they manage to conduct this exam over just 10 days? 250,000 students want to give CAT and with just 10 days to chose from around 25,000 will be giving CAT on each day. This just does not seem feasible with the current infrastructure in place. May be they are relying on the fact that with increased fees for CAT a good number of applicants would be driven away and then conducting the exam might be feasible.

Another interesting question is why can’t IIMs and other Indian B-Schools start accepting GMAT scores? If they are conducting the exam online, through Prometric ETS (which also conducts GMAT then may be accepting an already established exam score makes more sense. The only problem might be that in GMAT a good number of people score in 700s and above so it will be difficult to screen the applicants for Group Discussion / Personal Interview.

Anyways all the best for CAT this year! Do share your experiences after the exam.

Like what you read? Share in your network!