top of page

What the coding of a web page has to do with the quality of the news on it

A simple look at the components of an HTML page tells a lot about the reliability of its contents. Problem is, distribution platforms don’t bother looking at those signals. (Part of a series about my News Quality Scoring Project.)

As a microbiologist once said, the devil is not in the details, but in the structure. She was referring to the genetic arrangement of a deadly strain of virus. The digital world bears some resemblances to a living organism. It morphs constantly, it is unstable, and there is much organic garbage strewn around. For example, on The Guardian, for a single character of article, you can expect about 100 characters of code; more on that here.

Journalism’s visual tradition dictates providing a minimum set of elements to let readers assess the origin of information. For instance, a story must display from where it is reported or written, and by whom. We are supposed to know a little about the authors, sometimes with access to heir and body of work. The Trust Project at the University of Santa Clara focuses on developing standards for better journalistic transparency (see its list of indicators). My own project at Stanford’s John S. Knight Fellowship is complementary to the Trust Project.

The News Quality Scoring Project (NQS) is aimed at finding and quantifying “signals” that convey content quality. The idea is to build a process that is scalable and largely automated. Incidentally, it will contribute to debunk fake news by “triangulating” questionable sources—see this previous Monday Note.

As of today, we just completed a collection of 640,000 articles resulting from three weeks of gathering data from 500 of the largest American websites and their 850 corresponding RSS Feeds. The task is now extracting and analyzing relevant signals, assessing their relevancy, reliability, and resistance to tampering (more on this in a few weeks).

Coming back to the HTML structure approach, let’s look at the components of a basic article on the web:

To read on go to:


Featured Review
Tag Cloud
No tags yet.
bottom of page