Algorithms as a tool to better journalism

Wired‘s recent story about Narrative Science seems to have put some journalists into a bit of a tizz. The article is a must-read for journalists and coders — really interesting tidbits about what’s going on in this field now, and what might come to pass in the future.

I’m actually very excited about the possibilities of Narrative Science, an artificial intelligence product that transforms data (currently primarily from the sports and finance world) into stories. This is the exact kind of thing we’re after when we encourage J-Schools to put software engineering into journalism curricula so we can teach young journalists valuable new skills so they, in turn, can not end up helpless on the sidelines, as many of us current journos have been during the technology advances of the last decade.

The method does not determine the value

Narrative Science is not a threat, it’s a tool, and it fills a need. Instead of some capable writer poring over boring financial statements and trying to add sizzle in reporting on them, a machine reads the data and spits out two grafs. Two serviceable but really snoozy grafs, which probably would have happened if written by a human, too.

Here’s what’s intriguing, though: Narrative Science is working on ways to be not-snoozy, and in so doing they’re calling journalists on our BS, in a way. What I mean is this: Journalists have formulas. We do, and they’re taught in schools and learned on the job. “Reverse pyramid.” “Nut graf.” “Lede.” “Attribution.” These are plug-and-play tactics most of the time. Sure, these elements vary from story to story, and that is the fun part of what we do. We add details and context. We observe and report. But at core, we tell different stories using some slightly different combinations of these tactics and tools.

Arguably, feature stories have slightly more variety, but I’d also point out that (sadly) many features are also just puzzle pieces, if not downright parodies of themselves. For example, every feature on every female celebrity ever starts this way:

“[Lady celeb] walks into [L.A.’s or New York’s] [restaurant or cafe in trendy neighborhood]
looking gorgeous in [brand] jeans and no makeup.”

Whether the editors or writers are making the words hacky, hacky they are — and boring, just like the pieces Narrative Science is creating with its algorithmic journalism. Fascinatingly, according to Wired, the company actually has “meta-writers” whose job it is to help the computers add context:

“[Meta-writers are] trained journalists who have built a set of templates. They work with the engineers to coach the computers to identify various ‘angles’ from the data. Who won the game? Was it a come-from-behind victory or a blowout? Did one player have a fantastic day at the plate? The algorithm considers context and information from other databases as well: Did a losing streak end?”

But to answer the question posed in the headline of the piece, “Can an Algorithm Write a Better News Story Than a Human Reporter?” for now the answer is no. And journalists vs. algorithms is a faulty comparison.

Writers and editors add value using tools

Narrative Science, thanks to algorithms created by human engineers and journalists, is now at the level of being able to programmatically spit out phrases like “whacking home runs.” But it can’t gauge a crowd’s restlessness or excitement. It can’t interview a superfan after the game, sense that he’s fed up with the team and write a mood piece. It can’t connect on a human level to a victim of a crime, or spend days following a subject then put together disparate threads of the subject’s life into a coherent portrait.

Which is why it’s not a real threat just yet. The way I see it:

Narrative Science : journalists : : spell-check : copy editors

It’s a tool that does a programmatic task, but not a contextual one, as well as a human. Does spell-check tell you you have the wrong “hear/here”? No. Does it correct you when you’ve spelled “embarrassing” incorrectly because it is drawing from an enormous database of correctly spelled words? Sure, easy enough. Can it check a fact’s accuracy against a thousand links on the Internet? Probably. But can it call a source and make sure she wasn’t misquoted, then correct the quote before publication? Not likely.

Context is everything, and it’s ours to use. But we journalists have to use it. Yes, we have formulas. We write ledes, and we edit the story so the most important information is up front. But we have to step up our game. We have to go to the match, or the crime scene, or the meeting, or the fashion show, or the foreign city, or the war, and add context for readers. We shouldn’t hack our way through the really interesting stuff — we shouldn’t be allowed to. Let’s let bottom-scrapers scrape the bottom for us. Let’s not waste human effort on shitty content farms that pay $2 (!) an article. Let’s leave that for robots and invest elsewhere: in hiring more and better writers and editors to make connections, describe the atmosphere, make sense of things, tease out themes and (cue dramatic music) better humanity. Let’s invest in creating data and algorithms that we can program to help us help ourselves.

Read More

Nisenholtz on content and tech

Four important bits from this interview with former NYT digital guy Martin Nisenholtz.

“Human-mediated content is important to me because it both introduces a hierarchy of importance as well as a kind of serendipity.”

“If you’re in the business of creating news and information, you get these kind of blinders, where you think everybody is into it. But the fact is, when you go out and you talk to people who are not in the business, they’re leading their lives and doing what they do, and for them everything is just totally optional. … [99 percent of people] care about how what you do affects their lives. Unless you touch them, in a very meaningful way, you will fail. If you focus on the technology, or focus on what will be cool about it to a very small group of people, it’s just not going to work.”

“I really think it’s important for traditional news sources to embrace the technology side of our business — and really understand what the application side can do for content. Not just publishing content from one source and porting it into a bunch of templates.”

Here he’s referring to Twitter, but this is arguably the principle behind the rise of Facebook, too, and the stagnancy of Google Plus:

“If there are no other people on the network, it’s going to be pretty useless. But the more people that join the network, the richer it gets.”

Read More

Blogging and journalism

Smart thoughts from GigaOM as the HuffPost wins a Pulitzer and the NYT launches another stand-alone blog:

“The question ‘are blogs journalism?’ — or similar questions such as ‘Is Twitter journalism?’ — make no sense any more, if they ever did. Are telephones journalism? Are pencils and pens journalism? No. They are just tools. A blog is also just a tool, one which can be used for journalism and for many other things as well.”

I mostly agree, but that being said, I think there is a big difference between original reporting and aggregation, between thinking and curating. The tools of blogging have made the latter items much easier to do.

One challenge for so-called old media in adapting to the new world order is that the audience still has an expectation from them of quality original reporting, and it’s difficult if not impossible to fund news analysis, foreign bureaus, unions, reporters on assignment, long-form journalism, spotless editing (etc.) and still make payroll, while your upstart competitors do not bear the burden but often do reap the rewards of these expenditures.

Read More

Must-read article of the moment

Fascinating, well-reported and just epic (and lengthy) piece in CJR. Covers everything from clusters and strong networks (and Arianna Huffington’s charm in the creation at scale of both) to the ill-conceived AOL Way. Manages to discuss what it means to have conversations with readers, the difference between content and journalism, and the magic of good timing and serendipitous, seemingly unrelated events. While acknowledging that some things just happen, also recommends SEO’ing the hell out of content to grease the skids. References Lord of the Flies and “Why wasn’t I consulted?” Frankly, suggests a new paradigm for business: embracing failure and iterating. Epic!

[jamiesocial]

Read More

Personalization by way of automation

Interesting peek at what happens when personalization by way of automation and algorithms takes a dark turn.

“Unlike tabloid television, algorithmic personalization does not announce that it’s pandering to base interests. When sensationalized reports about violence against children are on TV, I can change the channel — an act that is harder to do on the Internet when seemingly ‘neutral’ spaces, like Yahoo’s homepage, leave no tell-tale trace of manipulation.  You can’t change the channel when you don’t know you’re watching the program.”

Another argument in favor of the human curator (read: editor) as we stumble through sorting out what can be programmed, what should never be, and where the middle ground is.

Read More

Search vs. social

The sands of the Internet are constantly shifting underneath us. One major example is content distribution and audience reach via search vs. social. So much has changed even in the last year with regard to how people get information via search vs. social. This article is ostensibly about how Google+ isn’t a Facebook killer, but the part that stood out to me was this:

“Once upon a time…you hopped onto a search engine, plugged in a search term, found what you were looking for and went your merry way. [But] sharing and following and ‘liking’ and so forth have become the primary way people gather and dispense information. Search is still a big part of the equation, but social is getting bigger.”

In many cases—many, certainly, but not all—people trust their networks more than they trust a search engine’s results. It’s a fundamental understanding that content creators must adapt to. It’s no longer just about gaming SEO to rank in search; it’s about creating quality, sharable, trustworthy content.

Read More

Content as a product

Really interesting piece on treating content as a product and what that means for scaled production, if there is such a thing:

“You can’t apply industrial-age economics to content production. Content doesn’t get cheaper as the volume goes up. Unlike Ford’s automobiles, the cost of quality content goes up with the volume because content production involves skilled labour and very few economies of scale. Plenty of organisations try to work around this hard fact using various forms of automation.”

The author’s point is that it’s a gamble to “churn content without a plan”:

“While automated tools can be useful, letting general trending topics or ill-chosen metrics replace a strong editorial strategy will drive the relevance of your content down. Licensed or crowd-sourced content will rarely be tailored for your audience’s needs, the tone of voice they are most accustomed to, etc. Every mismatch drives relevance down and reduces your chances of the gamble ever paying off.”

I’d further argue that, content-wise, in the age of social sharing and a meme a minute, tone and trust are everything.

Read More