Re-engineering News with Technology

Years ago, in college, I went to a presentation by a big internet company, as part of a recruitment event. At the time, I was working at the college newspaper, and the talk was about their “front page”. They said it was the biggest news site at the time, so I was excited.

The bulk of the talk was technical. But the presenter mentioned that one of the biggest challenges was keeping abreast of what they called the “National Enquirer effect”. The problem, as she described, was this. The main goal of the front page is to drive traffic to other properties; and the system was always optimizing both the selection of content on the front page and its ordering based on raw clicks. He said, while no one admits to it, content with the best-clickthrough rate was always “bikini women”, so left alone, algorithms would turn the front page into National Enquirer. Ironically, this means that no one would visit them, over a long enough period. They said they were trying to fix this by some longer term optimizations, but for now, there was essentially a team for each locale that monitored the site, and kept it “clean”.

Couple days ago, I saw a tweet about a NYTimes wedding post. It said “Trevor George asked Morgan Sarner out to dinner 10 nights in a row, and won her heart”. The person, whose retweet I saw, said she’d probably get a restraining order. It was funny, I “liked it” but it seemed odd that NYTimes would promote such creepy behavior. So I clicked on the link. It turns out the groom did not just ask her out 10 days in a row, but took her out as such. It’s a minor difference, between the text in the tweet but the actual content, but it was enough to get that person to retweet a mock of it, and more ironically to get me to click on it.

Ev Williams, the co-founder of Twitter and Medium, likened the algorithms that govern the internet as to a Deus ex machina that provides you with the most extreme of what you want; you think car crashes are interesting? Here’s a pile up! It feels true, and definitely explains the long-winded global nausea.

Looking at it another, though, this is just specific application of the paperclip maximizer. Instead of natural resources of the earth, we are just mining minds. And instead of making more paperclips, we are just making some people in Bay Area richer. I live in the Bay Area, for now, so of course I shouldn’t complain.

But what’s really missing from the debate is how technology has really failed to find a way to attract attention of its readers without sacrificing the content. And while some of it is done automatically, some of it is self inflicted.

There are structural and economic explanations for the problem. Internet first destroyed the newspapers’ monopoly on advertisement. Then the glut of content came, with democratization of publishing tools, further pushing down the value of any individual work. Unbundling of pieces from the newspapers and magazines that carried them reduced the value of a brand, and in turn pieces that make up a bundle too. As social media platforms further flattened all content into same structure, be it from The New York Times or some kids from Macedonia, any semblance of product differentiation.

The number of knobs publishers have is dwindling and their editorial decisions is one of their last levers. When you are competing with so much content, and you don’t control how your content is distributed, your only option is to change your content to fit your distribution channels. Legitimate news organizations have long erected a well between “the church and the state”, or rather “editorial and the advertising” but at least in terms of packaging, the wall no longer exists. The only difference is that it’s not he advertisers that determine your content now, but your distributors.

I saw this first hand too. At Digg, we would casually tell big publishers and famous individual alike that if they worded headlines a specific way, they would get more clicks. Sometimes it worked, sometimes it didn’t. Google has entire guides, mostly technical but with editorial hints, on how to help you get more traffic.  Facebook does it too, but slightly for different reasons. They want people to click on the content, but not too much, so publishers better avoid clickbait titles. And of course, most publishers, especially smaller ones that do not have big subscription revenues or rich patrons to back them, get in line.

This is not a jab at newspapers, although it is that a bit. My real qualm is that we still don’t have a proper way to consume the news where the hook doesn’t dictate the content. We built search engines that can scour the entire web in less than a second, but I still can’t figure out whether a piece of content is worth my time, or is just fluff. I can take a virtual tour across the globe, but I cannot tell what a federal policy change means for me as a resident in California. The primary problem is funding and revenue, but is there a lack of imagination as well?

I also don’t know if the solutions to these problems will exist on the supply side, or the demand side. Probably, it will need to be both. Publishers need ways to authenticate and brand their content, and consumers need reading experiences that respect those. Moreover, consumers need a better way to find and consume content that respects the integrity of it, and not let it be violated for distribution.

There are a lot of attempts to build a new stack for consuming news. Services like Blendle attempt to fix monetization by removing the hurdle of micropayment, and also consolidate subscriptions. Facebook and Google try various things too; AMP is a way to clear up the reading experience (and cynically move more of the content to Google servers), Facebook’s Instant Articles is a more locked-in and heavy-handed way of doing the same. Both Facebook and Google also want to help publishers gain more subscribers, and the subscribed users to have more fluid, integrated experiences on the web with their own platforms.

And of course, publishers, try their hands too. One of my personal favorites is what Axios does, with their telegraphic, lightly structured way of presenting their content. It feels respectful of my time as a reader, cuts through the fluff without sounding too clinical. I wish more publishers experimented with radically different, but still thoughtful ways of producing and presenting content like them.

At that talk I went, they said one of the ideas was to have a fluff lever; slide it to one side and you get practically smut. Slide it all the way to the other, it’s all dreary politics, which wasn’t smut at the time. As far as I know, they never launched it.

Internet has undermined, intentionally or not, the workings of all news organizations. It took over their advertising, their users’ attention, and now a few companies inadvertently are guiding more of the content too. The different responses to this change lie across the political spectrum. What is common, though, is that the problems will not go away, and the economics that govern newspapers will not go back to where they were. But maybe, there are ways to attack this problem with technology, as well as with policy.

I am not sure if that is the answer, but maybe it could be worth trying.

Fake News is an attention economy problem

A common theme of this blog is that history repeats itself. There are some fundamental dynamics of information that are innate to the internet, and most companies coast those trends. There are occasional shifts; like the smartphone with its always-on-connectivity and sensors but things more or less follow certain trends.

The recent rise of “fake news”, or cheap information that plagues everywhere that Facebook, and to a smaller degree Google, is dealing with has precedents and can be explained (and predicted, as many did) basic look at the economies of attention, which is the another theme of this blog. Being somewhat reductionist, the problem can be view as a spam issue, on steroids. I admit the integrity of presidential elections is a more serious problem than loss off productivity but a more sterile approach might help come to some immediate solutions.

Facebook might be the punching bag these days for everyone, especially journalists, but Google had its fair share of spam issues. Not too long ago, at around 2009, the Mountain View company was fighting a fierce war against what was then called “content farms”. These companies would basically figure out the trending Google searches, create extremely cheap content, real fast, and do some SEO magic, and get traffic from Google, against which you can sell ads. As long as your cost of production was lower than your revenue from ads, you were golden.

This was a big, lucrative business. The biggest player in this game, aptly named Demand Media was a billion dollar public company. This Wired feature on the the company is full of amazing anecdotes. The company ran many, many websites targeted at virtually any vertical, including one called Livestrong, a franchise of the none other than Lance Armstrong.

Google, soon woke up to the danger, and issued an update to its “algorithm”, called the Panda update and effectively kneecapped the entire industry. Today we are looking to hear from Facebook CSO Alex Stamos, but Matt Cutts of Google was all the rage back then.

Facebook even had its fair share of “spam” problems, and while company might seem like paralyzed in an effort to satisfy both sides, it wasn’t always that way either. Zynga figured out the dynamics of News Feed, as well as the psychological rewarding mechanisms of unsuspecting “gamers” and built a billion dollar business around it. In the meantime, though Zynga and its flagship FarmVille game became synonymous with spam. When Facebook woke up to the problem, and took action, the resulting tweaks nearly killed Zynga too. The gaming company is still around, as a public company, but it’s struggling to even pay for its HQ. Same pattern also happened with companies like Upworthy, and many other “viral” news sources.

As an outsider, it’s not clear how much of an existential crisis this is for Facebook. Google’s struggles with content farms was an existential risk; users losing trust in their search engine can jump ship to Bing or any other. Facebook users are locked in to the platform, and by the virtue of social networks, as more users join, its gets harder for next user to leave. The social network is more or less the world’s biggest address book for many, and the filter bubbles really make the problem of fake news only one someone else can diagnose for you, not unlike a mental disorder. Some like Sam Biddle even argue inherently benefits from our endless craving of drama. Russian interference in US elections propelled the problem to mainstream media, but that was unintentional.

Moreover, the numbers itself make it a challenge. unlike a few content farms (or virtual farms, in case of Zynga) that can be easily identified, for Facebook, there are 5 million advertisers who can push any sort of content to users’ news feeds. Still, it doesn’t seem like an unmanageable number. There are many business that have similar number of customers, who seem to keep a handle on them.

It wouldn’t be great for Facebook’s bottom line to have to increase the cost per customer, but it is probably the right approach for the long term. The media and tech analyst Ben Thompson argues the same in his column. (Subscription might be required) Facebook flew past its competitors partly by being the saner, more refined, Ivy-grad built and approved alternative. Google probably doesn’t miss revenue it used to earned from the content farms, and Facebook certainly doesn’t miss Upworthy. Longer term vision would help. A company that’s building solar powered planes that communicate each other via gyroscopically stabilized lasers should be able to solve some spam issues.

As a sidetone, it’s worth mentioning the opposite examples. These cheap SEO or virality games do not always end badly for companies. For each Demand Media, there’s a “success” story like Business Insider, and the like. The journalistic pasts of these organizations are questionable. Both, among others, have built their businesses on borrowing content from other organizations, having fewer and more junior staff, but really playing the SEO game better than anyone. Similarly, Buzzfeed is a now serious journalistic powerhouse now but the company was decidedly built on subsidizing actual journalism off of more viral, bite-sized content.

The fact the solutions will emerge only points to the chronic nature of the problem, however.  Facebook, Google, or any platform can solve the spam problem, given enough resources and focus. An economy that’s based on commodified attention poses not just passing economic challenges to tech behemoths, but existential risks for a regime that’s somewhat predicated on an educated public. The history of attention economy is the subject of Tim Wu’s excellent book Attention Merchants, which I can’t recommend highly enough.

When people’s attention can be sold to the highest bidder, the producers with the lowest fixed costs will rule the world. A few years ago, it was Demand Media, then it was Zynga, then Upworthy and Huffington Post, and today it’s everyone. As costs of production goes down (which is a good thing), the challenge will get harder. Moreover, as targeting of not just ads, but any content, becomes more precise, yet more opaque, the shared context that holds a society together will inevitably decay.

It might be a libertarian pipe dream to live free of interference from anyone, in one’s own digital and physical cocoon, but that seems untenable in the long run for a liberal democracy. At some point, we will have to elevate our rights to our information laid down in a more robust fashion, instead of relying on the good will of a few people living in California. Spam, as a risk to productivity, was solved by better technology, as well as regulation that required transparency to widely distributed emails. But most importantly, it got solved after we acknowledged the problem, saw the long term risks, and attacked it at its mechanics.

With Big Data Comes Big Responsibility

It’s getting harder to suppress the sense of an impending doom. With the latest Equifax hack, the question of data stewardship has been propelled to the mainstream, again. There are valid calls to reprimand those responsible, and even shut down the company altogether. After all, if a company whose business is safekeeping information can’t keep the information safe,

what other option is there?

The increased attention to the topic is welcome but the outrage misses a key point. Equifax hack is unfortunate, but it is not a black swan. It is merely the latest incarnation of a problem, that will only get worse, unless we all do something.

The main issue is this: any mass collection of personally identifiable data is a liability. Individuals whose data is vacuumed en masse, the companies who do the vacuuming, and the legislators should become aware of the risks. It is fashionable to say “data is the new oil” but the analogy only goes so far, especially when you consider the current situation of the oil-rich countries. Silicon Valley itself here is especially vulnerable.

Big parts of the tech industry in Bay Area  is built on mass collection of such private data, and deriving some value from it. A significant part of the value comes from, somewhat depressingly, from the ever increasingly precise ad targeting. The problem with this model was long known, if not tacitly admitted by its creators, but it wasn’t until the Snowden revelations real national debate has picked up. With the recent brouhaha following the 2016 Elections, and a real risk of an authoritative government in the US, the questions are louder this time.

Public outcry does help, but the change is very slow. Part of it is the business models are wildly successful. Combined Alphabet (née Google) and Facebook are a trillion dollar duopoly. The cottage industry around these two companies, along with practically all stakeholders in the area being somewhat either beholden or financially tied to the industry, motivation to change is small.  Some companies, like Apple, try to raise the issue to a higher plane of morality, part for ethical reasons, part competitive. But the data keeps getting collected, at an ever increasing pace and it’s getting more and more likely a catastrophic event will occur.

Let’s first talk about how data gets exposed. Hacking, or unauthorized access is the most talked about but it’s far from the only way. A lot of the times, , it’s just a matter of a small mistake. Take Dropbox. A cloud storage company once allowed anyone to log into anyone else’s account by entirely ignoring a password check. The case was caught quickly, but it’s a dire reminder of small mistakes can happen. And that is a point worth pondering, separate the recent hack Dropbox suffered from.

As easily data is collected and stored, it’s even easier for it to change hands. Companies and their assets change hands, and so do the jurisdictions they live in. Russian tech sector is a prime example. Pavel Durov, the founder of the oddly popular instant messaging platform Telegram, first built VKontatke, a Russian social network site much popular than Facebook in the country,. But then came Russian government with demands of censorship. Durov ran away but the Russian social network is owned by a figure much closer to the government. And there’s always LiveJournal, which again got sold to a Russian company, now all its data under Russian jurisdiction.

And sometimes, the companies themselves open up themselves to being hacked. Once an internet darling, Yahoo! was put on spotlight when its own security team found a poorly designed hacking tool, installed by no other than company itself. Initially designed to track certain child pornography related emails for the government, the tool was built without the knowledge of the company’s Chief Security Officer, Alex Stamos, a well regarded security professional. He departed the company soon after, only to join Facebook. And again, this is just an addition to the Yahoo! hack that affected 1 billion users, and almost derailed multi-billion dollar acquisition.

Government surveillance is a touchy subject, and moral decisions are always fuzzy, with someone being unhappy. Governments should use tools at their disposal to keep their citizens safe, and this might sometimes require uncomfortable measures. This doesn’t mean they should be given a direct access to millions of people’s private, however. Intelligence efforts should be directed, not drag net. Living in a liberal democracy requires a certain amount of discomfort, not pure order.

But it is hard to deny the evidence at hand, from once liberal darlings like Turkey to known autocratic regimes like China, any government will find it impossible to resist the temptation to take a peek at the data, one way or another.

Governments are made up of people, just like corporations are. The solutions to these problems won’t be easy; with so much already built, tearing it all down is not an option, or even preferable. The industries built add value, employ thousands, if not millions. But we have to start somewhere, both as individuals, technology companies, and legislators.

First, individuals need to be more cognizant of their decisions about their data. Some of it will require education, from a much younger age. But even today, for many, there are a lot of easy steps one can take.

For many uses, a more private, less surveillance oriented tools already exist. Instant messaging tools like WhatsApp (once bought by Facebook for a whopping $19 Billion) is easy to use while using an cutting edge end-to-end encryption technology borrowed from Signal. One can wonder, if essentially playing spies is worth the hassle, but the risks are real, and getting more so every day even for congress people in the US.

For regular browsing, things are in worse shape. Practically every site on the internet tracks you across every other site, shopping and news sites are particularly bad. The users are fighting back, with sometimes clunky, equally overzealous tools. Thanks to an overzealous adoption of ads, both intrusive and sometimes malicious, ad-blocking is on the rise around the world. It is hard to fault consumers, most would benefit from using an independently owned Ad-Blocker like uBlock Origin, or using a browser like Brave that has such technology built in. Apple recently updated its browser Safari on both macOS and iOS to “intelligently” curb cross-site tracking.

For things like email, and cloud storage, things are trickier. For many users, their data is safer with a big company with a competent security team, as opposed to a smaller service provider. There’s a balance here; while big providers are much juicier targets (including governments who can request data legally), they also have the benefit of being hardened by such attacks. Companies like Google use their own services, further incentivizing them to safeguard data, at least from hackers.

However, even then, most people would benefit from increasing the security from the default values. For users of Gmail, Dropbox, and virtually any other cloud storage technology, using 2-Factor authentication, coupled with a password manager is a must.

And largely, going back the cognizance, individuals must be aware of the data they provide and be at least minimally informed. When you sign up for a new service, before sharing with them all your data, see if they at least have a way to delete it, or export it. Even if you never use either of those options, they can be good signs that company treats your data properly, instead of letting it seep into their machinery.

For creators of such technology, things are harder but there’s hope. First step is obvious; companies should treat personally identifiable data as liabilities and collect as little as possible, and only for a specific purpose. This is also the general philosophy behind EU’s new General Data Protection Regulation (GDPR) directive. Instead of collecting as much data as possible, hoping to find good use for it later, companies should only collect data, when they need to. And most importantly, they should delete the data, when they are done with it, instead of hoarding it.

Moreover, companies should invest in technologies that do not need collecting data at all, such using client side computation instead of server side. Apple is the prime example here; company uses machine learning models that are generated on the server, on aggregate data, for things like image recognition or speech synthesis on the devices themselves. Perhaps a sign of poetic justice, the intelligent cross-site tracking Apple built-in to its browser is based on data collected in aggregate form, instead of personally identifiable fashion.

It is not clear, if such technologies can keep up with a server-based solution where iteration is much faster, but the investments might pay dividends. Today’s smartphones easily compete with servers of just a few years ago in performance. Things will only get better.

And for times when mass collection of data is required, companies should invest in techniques that allow aggregate collection instead of personally identifying data. There are huge benefits to collecting data from big populations, and the patterns that emerge from such data can benefit everyone. Again, Apple is a good example here, though Uber is also worth mentioning. Both companies aggressively use a technique called differential privacy where private data is essentially scrambled enough to be not identifiable but still the patterns remain. This way, Uber analysts can view traffic patterns in a city, or even do precise analysis for a given time, without knowing any individual’s trips.

And more generally, companies should invest and actively work on technologies that reduce the reliance on individuals’ private data. As mentioned, a big ad industry will not go away overnight, but it can be transformed to something more responsible. Technologists are known for their innovative spirit, not defeatism.

End-to-end encryption is another promising technology. While popular for instant messaging, technology still in infancy for things like cloud storage and email. There are challenges; the technology is notoriously hard to use, and the recovery is problematic when someone forgets their encryption key, such as their password. Maybe most importantly, encryption makes the data entirely opaque to storage companies, severely limiting the value they can provide on top of it.

However, there are solutions, some already invented, some being worked on. WhatsApp showed that end-to-encryption can be deployed at massive scale and made easy to use. Other companies like Keybase work on more user-friendly ways to do group chat, and possibly storage, while also working on a new paradigm for identity. And there’s also more futuristic technologies like homomorphic encryption. Still in research phase, if it works as expected, technology might allow being able to build cloud storage services where the core data is private while still being able to be searched on, or indexed. Technology companies should direct more of their research and development resources efforts to such areas, not just better ways to collect and analyze data.

And lastly, legislators need to wake up to the issue before it is too late. The US government should enshrined privacy of individuals as a right, instead of treating as a commercial matter. Moreover, mass collection of personally identifiable data needs to be brought under supervision.

Current model, where an executive responsible for leaking 140M US consumers’ can get away with a slap on the wrist and $90M payday, does not work. Stronger punishment would help, but preventing such leaks at the source by limiting the size, fidelity, or the longevity of the data would be better.

Moreover, legislators should work with the industry to better educate the consumers about the risks. Companies will be unwilling to share details about what is possible with the data they have on their users (and unsuspecting visitors) but it is better for consumers to make informed decisions in the long run. Target made the headlines when it reportedly figured out a woman was pregnant before she could tell her parents. Customers should be able aware of such borderline creepy technology before they become subjects to it. Especially more so considering Target itself was also a victim of multiple major hacks. Facebook recently was the subject of a similar report where the company discovered a family member of a tech reporter (the same reporter who broke the Target story), unclear to everyone how. Individuals should not feel this powerless against corporations.

The current wave of negative press against Silicon Valley, caused mostly by the haphazard way social networks were used to amplify messages from subversive actors, is emotionally charged but is not wholly undeserved. Legislators can and should help technology companies earn back people’s trust, by allowing informed debate about their capabilities. A bigger public backlash, when it happens, would make today’s pessimism seem like a nice day in the park.

There are huge benefits to mass amounts of data. There is virtually no industry that wouldn’t benefit from having more data. Cities can make better traffic plans, medical researchers study diseases and health trends, governments can make better policy decisions. And it can be commercially beneficial too, with more data we can make better machine learning tools, from cars that can drive themselves to medical devices that can identify a disease early on. Even data that is collected for boring purposes can become useful; Google’s main revenue source selling ads on top of its search results, which no user would want to get rid of.

Data might be new oil, but only with mindful, responsible management of it will the future look like Norway, rather than Venezuela or Iraq. In its essence, personally identifiable data in huge troves is a big liability. And the benefits we derive from such data currently, is largely mostly used for things like better ad targeting. No one wants to go back to a time without Google, or Facebook. But it possible to be more responsible with the data. The onus is on everyone.