"The New York Times newsroom," by Marjory Collins, Wikimedia Commons

“The New York Times newsroom,” by Marjory Collins, Wikimedia Commons

The number of employed journalists in the United Kingdom declined by 6,000 or 9% since 2013 as falling ad revenue has squeezed publishers to cut costs, and newsrooms on the other side of the pond have seen similar losses. This means media outlets are cutting content creators at a time when they are demanding more content creation. Some experienced writers have been replaced by younger, cheaper ‘digital-natives’ but publishers will increasingly use fast-acting, data-hungry robo-journalists instead.

If you think that’s far-fetched, they’re already here and learning fast.

The Career Path of the Robo-Reporter

Several publishers including The New York Times, LA Times and Forbes employ robots: clever computer programs that use algorithms to gather information and natural language generators to churn out reader-ready copy.

They’ve shown a natural aptitude for data, but “careers” that started on sport and business desks are now moving into breaking news and investigative journalism.

Like all junior reporters, the robots are learning from their copy editors. Although in this case, the ‘subs’ are there to actually, not metaphorically, re-program them.

Like junior reporters, they can learn from and draw on a back catalogue of great writing—but with more powerful memories and analytical techniques.

A few big publishers will understand their potential and let them shine, while others will only ever give them mundane jobs.

They’ll open doors for the sort of nimble new companies that arrive during disruption, able to use technology for what it is suited.

Silicon Valley is already immersed in the technology behind robo-reporting but will use it first in fields like healthcare. They’ll enter robo-journalism later, buying or eliminating the newcomers.

How Good is Robo-Journalism?

"Spaceman Robot," by D J Shin, Wikimedia Commons

“Spaceman Robot,” by D J Shin, Wikimedia Commons

It has taken ages to reach a point where people in tests can’t tell the difference between machine written articles and similar articles by humans. But a key feature of the ‘new machine age’ is that slow development quickly turns to accelerated gains.

A hard exercise has been getting journalists to verbalize what they’ve learnt to do instinctively. Once verbalized, those lessons are turned into algorithms. Machines can then trawl through wire stories, the Internet, press releases and data sets, finding and writing stories.

They don’t, of course, knock on doors, burn shoe leather or make contacts and phone calls. However, they can do the same tasks as the increasing proportion of journalists set to aggregating and repackaging news or making sense of the increasingly digitized data that informs the news.

After algorithm creation there’s slow fine-tuning. This is human labor-intensive process of reprogramming but it is coming on apace.

The Associated Press once checked everything machines produced but now they put the majority of it on the wire directly.

Other companies will check everything, but decoupling machines from over-zealous human chaperoning will be essential to take full advantage of what robo-reporters can offer. It is what will make new entrant companies nimble; it is what will hold back established publishers. Machines produce more if checking doesn’t slow output.

Machine Learning

Machines can learn language from large banks of expensively produced, comparable texts.

They translate between languages by comparing decades of EU and UN reports, expensively translated into multiple languages by humans. When asked to translate a sentence, they scan these translations to find a close match or a few fragments they can add together.

Similarly, that news media publishes to the open web means machines can compare how publishers cover the same story. They learn alternative phrases, different approaches, narratives, tones and house styles.

It means they can be set to write with a particular skew: in support of a sports team or against a political party.

They can learn in an unsupervised way. They can absorb captions under hundreds of pictures and so describe what is in a new picture. They can test their understanding of stories against summaries like those CNN and MailOnline use in articles. They can learn in dynamic environments, reacting to events around them. They don’t always need a human to slowly feed them knowledge.

This rich back-catalogue of digitized articles is also a source of facts to draw on. Machines have powerful memories. Fact checking is fast.

Paul Pierotti, Managing Director of Accenture Digital says, “This technology is being used in healthcare firstly because of its ability to digest vast amounts of textbook knowledge and new research; secondly because it can diagnose what it sees in pictures or in patient data; thirdly, it can use language to report the diagnosis along with supporting evidence and recommendations. The reasoning and language will evolve to feel human. If the healthcare industry can harness that potential so too can news companies.”

Data Analysis

"Big Data," by DARPA, Wikimedia Commons

“Big Data,” by DARPA, Wikimedia Commons

Machines are adept at investigating data sets. Publishers have set them to tax records, homicide data, meteorological reports and more—looking for patterns and describing them. They’re thorough, not prone to error and they’re fast.

The LA Times uses robo-journalism to break news about earthquakes because machines can analyze geological survey data faster than a human. It takes under five minutes to spot a story and get it online.

Robo-journalists are arriving at a time when the lack of data skills amongst journalists is starting to show. Peter Bale, Managing Director of CNN observed at a Reuters Institute Big Data event that traditional journalists who aired opinions based on very little proof were being embarrassed by people, often outside the industry, who could draw more solid conclusions from data. Machines will help publishers catch up.

Personalization

Machines can produce multiple versions of an article to make it more ‘personal’—to give it local flavor, for example.

Again, speed is important. Re-writes are produced in a fraction of the time it would take a human so stories are both current and personal.

As well as location, personalization might be based on the demographics or behaviors of groups of readers, as determined by their online activity. Publishers already target advertising like this.

Articles can be re-written based on what an individual might show they know about a topic in an interactive element in an article. The machine then describes the difference between your perception and reality.

We’ve always read between the lines to understand how we are personally affected or to see how reality differs from what we assume. In the future we will only need to read the lines themselves to understand those things.

Language translation is another form of personalization. Publishers from De Correspondent to The Economist have ambitions to find new customers amongst different language speakers. Machines offer opportunity.

Related: The News Social Network: Q&A With De Correspondent’s Rob Wijnberg

Finally, machines can write tirelessly. By covering more topics they’re more likely to write about your football team, your industry and etc. News feels more personal.

Interest to Advertisers

by kaboompics, Pixabay

by kaboompics, Pixabay

Robo-journalism will be of interest to the advertising department. They’ve built native advertising units to write copy for advertisers—and charge a premium. Personalization means bigger premiums.

On the flip side, what if robo-journalism technology got into advertiser hands? They already plan to buy native at scale across many sites. What’s missing is the ability to speedily and cost-effectively re-write content to suit different publishers’ environments. If technology enables it, they can force prices down.

In Conclusion

The slow part is over. The rate of development will accelerate. Costs of entry will drop, as expensive lessons learnt by machines will be cheaply replicated in others.

Publishers will battle nimble robo-publishers and advertisers seeking to drive down cost. They’ll need to fully embrace the twin opportunities of data interpretation and personalization—and avoid chaperoning machines too closely.

Before you know it, these challenges will be upon us.

This post originally appeared in TheMediaBriefing and has been adapted for Sparksheet’s audience. Read the original here.