This article was written by Joseph Bernstein for BuzzFeed News in June 2015, and is the second article in a BuzzFeed series written with help from Columbia University’s History Lab.
How do we decide what counts as history? Well, there’s the first draft, journalism — the stories the media tells about the events of the day. And then there are the endless subsequent iterations, mined from primary sources and dusted off and polished by historians into arguments and narratives that shape our understanding of the world.
Then there’s a third option, one that is made possible by the deluge of electronic records kept in the second half of the 20th century, and tools of modern data science: automatic event detection. That’s the idea that software can read historical data to try to pick out patterns — discrete events that stick out from an ocean of data as significant.
In the early 1970s, the State Department began keeping electronic records of the thousands of cables its employees sent about American interests throughout the world. Researchers at Columbia’s Declassification Engine project believe it’s possible to automatically distinguish periods of increased activity in these cables that correspond to historically important events.
Three Columbia University statisticians — Rahul Mazumder, Yuanjun Gao, and Jonathan Goetz — developed an advanced statistical model that allowed them to sift through 1.7 million diplomatic cables from the years 1973–1977, including 330,000-odd cables in which only the metadata has been declassified. The model, with the help of the 2,600 cores in Columbia’s High Performance Computer Cluster, isolated 500 “bursts” — periods of heightened activity where more cables were being sent. And from those 500, the team investigated the top 10, what you might call the most active areas of American diplomacy in a four-year span that included the end of the Vietnam War, roiling conflict in the Middle East, and the OPEC oil embargo.
Read the full article on BuzzFeed News.