Is the exercise possible on a Windows machine?

At the meeting I couldn't install the software

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Recap –  my first Meetup didn’t go well

In another blog I talked about my first Meetup which was part of my MA in Online Journalism. My assignment is to build up my journalism network.

I was originally going to stick to an industry I know well, but decided that this would be a waste of a learning experience. I am looking at building a network of journalists and coders.

I must effort in this was to attended a Journocoders meeting in London. The meeting was an exercise in working with geographical data.

However, my attempt didn’t go well as my notebook would not load the software. Was this a Windows or a machine issue? I decided to try the exercise again on my desktop machine.

My homemade PC

My desktop machine

This is more powerful than my HP Stream. It is homemade, with an AMD A6-6400K, running at 3.9GHz and it has 8GB of RAM, running 64 bit Windows 10 Home.

Is this good? I’ve no idea, the choices were based purely on the money I found down the back of the sofa.

 

The exercise

Before the meeting we were asked to install the latest version of Node. This I managed without any issues. Node.js® is a JavaScript runtime built on Chrome’s V8 JavaScript engine. It uses an event-driven, non-blocking I/O model.

Which, to be honest, means nothing to me.

After the event, the instructions were amended to:

Windows users

An alternative way of installing everything you need is:
  1. Download Cygwin
  2. When you install Cygwin, the installer will ask you which packages you need. Make sure the Node and Make packages are selected.

General advice:
You need to host your HTML+JS+CSS on a server! Can’t run on local machine
My mistake — tutorial has been updated

I didn’t follow this amendment, however the note that this can’t be run on a local machine did set me off on a wild goose chase. This was ended when I contacted someone from my old business network.

It is easy to forget what you already have when you are chasing something new.

Step One

The exercise was to show how to take a dataset containing the outlines of all the boroughs of London, and combine it with data showing average income.

The combined dataset to will be used to build an interactive heatmap showing how income varies across London. The tools used were mostly built by Mike Bostock, who also wrote the exercise.

The first thing Window users had to do was to open PowerShell as an administrator. I must admit that I didn’t know that PowerShell existed. I’ve always used cmd.

We had to give PowerShell permission to run things.

We then had to install Chocolatey, which is a software automation package for Windows and helps install, configure and uninstall software.

From PowerShell we had to install Cygwin, Node, and Make:

PowerShell was then closes and Cygwin was opened. The command below was used to set Cygwin to use our Windows home directory. Cygwin was then closed and reopened. This was to be the terminal used for the rest of the exercise.

A terminal is an interface used to type and execute text based commands.

Keeping track of our steps

The exercise has a lot of steps, and each step produces a new file.A code editor is needed to write the steps. I used Atom.

To avoid having to run a load of steps again if one part was incorrect, a tool called Make was used.
Make allows users to create rules — which are made up of
  • a target, which is name of the file (or directory) that will be produced
  • the dependencies, which are the names of zero or more files required for the rule to run
  • a list of commands required to transform the dependencies into the target

Make can work out intermediate steps. Even if we have nothing made yet, we can make the first rule, which then makes that rule’s dependencies, which in turn makes the dependencies of those rules, and so on.

This means that if a change is made in any one of the steps, users can simply delete all our data, and make the last rule. All the intermediate files will be made too.

Atom was used to create a new file called Makefile to the project directory.

Getting the geometry

The source geometry for the London borough boundaries is the first bit of data required. The Greater London Authority publishes exactly this on the London Datastore.

The exercise was to download the data using the first Make rule:

The first line describes the name of the target, which is a directory named source.

First, a new directory is made, then the Zip file is downloaded. The file is unzipped into the source directory, and the Zip file is removed.

The following was typed into the terminal to run the rule.

What have we got?

Listing the contents of the directory ‘source’ [$ ls source] would show a number of datasets in the Shapefile format. This is commonly used by agencies and governments to distribute geographical data.

The main Shapefiles have the shp extension. The files with other extensions contain other linked data.

The dataset for this exercise is London_Borough_Excluding_MHW. This has the geometry for each of the London boroughs excluding the mean high water (MHW). It also has the City of London, which technically isn’t a London borough.

MapShaper can be used to preview Shapefiles.

 

Converting to GeoJson

Despite being a common format, Shapefile isn’t a very easy format to work with. The next step is to convert it to GeoJson. This file type is more suited to manipulating, and building interactives.

The next rule in the Makefile is:

This will create a new target called london-1.geo.json, shp2json converts Shapefiles to GeoJson.

This tool also reads from some of the other files that came with the shp file. This means that the GeoJson file also contains data on each of the boroughs, including a unique ID for each.These IDs are needed later.

The \ character allows us to move onto a new line for the next part of the command.
The > character allows us to redirect the output from shp2json to a new file. This is known as output redirection.

This now needs to be run in the Makefile.

This Is What the Most Accurate Map in the World Looks Like

British National Grid

The data was originally produced by Ordnance Survey, the Government-owned national mapping agency for Great Britain. (Northern Ireland is separately mapped by Ordnance Survey Ireland, who cover the whole island.)

Ordnance Survey data uses a coordinate system known as the British National Grid (BNG).

Coordinate systems

All coordinates have to be given in a reference system. Unfortunately there are many different systems as different systems are more accurate for different parts of the world. Also, new systems improve on accuracy due to the shape of the Earth, sea-levels and tectonic plate movement.

There are two types of coordinate reference system:

  1. Geographical coordinate reference systems: Coordinates given as latitudes and longitudes
  2. Projected coordinate reference systems: Coordinates given as locations on a flat surface (called X and Y). Here a projection converts a location on a sphere to a location on a flat surface. Different projections exist which make different trade-offs about how to display the data.

BNG is a projected coordinate reference system and is particularly good at representing Great Britain. It just isn’t any good for anywhere else in the world.

Most tools expect data to use a geographical coordinate reference system called WGS84, or the World Geodetic System.

This system works well for locations across the globe.

For this exercise BNG needs to be converted to WGS84.

This rule will create another target, called london-2.geo.json.

It calls the reproject tool we installed with NPM earlier, that converts from one projection to another.  For this to happen it has to be passed flags giving the EPSG numbers for each of the two systems.

Also the spatialreference flag is passed, this tells it to look up more unusual projections that it doesn’t already know about using an online reference — such as BNG.

European Petroleum Survey Group  (EPSG)

EPSG numbers are widely used by tools to identify the different coordinate reference systems.

EPSG.io can be used to look up which numbers refer to which systems.

Flags tell a command line tool what we want it to do.Flags can be mandatory (like –from and –to here), without which the program would have no idea what to do, or optional (like –use-spatialreference).

In this case the flags are long, which means they have more than one letter, and start with two dashes at the start.

The short flag format is more commonly used, with a single letter, such as -g. Unlike long flags these can be combined — so saying -gh is the same as saying -g -h.

The target now needs to be run.

Now the data is in WGS84, a projection needs to be applied to it before it can displayed on a screen. Many news organisations have a house projection they use in their graphics. One of the more common is Robinson.

The map is also going to be resized to fit into a 1000×800 pixel square. Scaling our map now will make things easier when it comes to scaling the map up and down based on the size of the browser.

It is also the rough dimensions of London, which is wider than it is tall.

This creates london-3.geo.json. Inside, it is calling geoproject which came with the d3-geo-projection set of tools we installed at the start.

This tool is part of D3, the library for building data-driven visualisations that will be used to displaying the data. It assumes the input data will be in WGS84.

Unusually, this rule wants to output the data through a string of Javascript, which has D3 already included.

D3’s projection library is used to create the function needed to transform our data. This function applied to the data, which is given the one-letter name d here.

The target has to be run before the next step [make London-3.geo.json]

What have we got?

We know that the Shapefile came with other files containing other information, but what information? This can be seen by

Cat (short for concatenate) prints a file out into the terminal. If two or more file names are used
it will print all their respective contents out together.

So there’s a lot of numbers in it, which isn’t that helpful. As it is in a form of Json (a structured format) it can be displayed in a more human-readable way.

For this function JQ needs to be installed. [$ choco install jq]

However, I never managed to get this installation to work. The exercise mentioned using Homebrew, but this isn’t available for Windows, only Mac OSX. But chocolatey should have been a suitable alternative.

So I tried installed direct from the JQ website. But that didn’t work either. However, I did find an online json viewer.

I have no idea if this is close to what was expected by running jq.

The online version does show a structure to the data.

FeatureCollection, which contains an array of Feature. Each feature is a London borough, starting with Kingston upon Thames. There should be similar data for every borough.

Properties: Each FeatureCollection contains some properties, and then a list of the coordinates for the exact geometry for the outline of that borough.

Each borough has a NAME, GSS_CODE, HECTARES, and some other information.

Feature: A geographical term, generally describing something that you want to display on a map. This can be anything from a river, a road, or the boundary of an electoral ward.

The GSS codes are nine-character unique numbers which are given to all administrative areas in the UK by the ONS.

As these identifiers are widely used across government and other organisations, they provide the key to linking the geographical data to other data sets.

Converting to ND-Json

It is possible at this point to skip straight to visualising the geometry. Often the geographical data isn’t the full story. It needs to be modified, filtered down to specific parts, or combined it with another dataset.

One way to do this is by using the ndjson-cli suite of tools. These tools are based on the ND-Json format.

 

The ND-Json standard itself isn’t anything specific to geographical data.

The rule will create london-4.ndjson. It does this by calling ndjson-split. This splits apart the array named features inside the FeatureCollection we saw in the previous step. This simply takes each feature from that array and puts it on a new line.

The < character is the opposite of output redirection (>)seen in an earlier rule.

It is called input redirection. It takes a file and routes it into a program. It’s much rarer as most programs take a file name as part of their input, but the ndjson-cli tools don’t for some reason.

Make this new target, remembering that the extension is now .ndjson.

$ cat london-4.ndjson | less -S shows one borough per line. the -S prevents the text being wrapped.

Joining up boroughs with average income

This step is to  join the geometry with a dataset published by HMRC giving the average income of tax-payers in each London borough.

The exercise gives the instruction:

$ curl -O https://files.datapress.com/london/dataset/average-income-tax-payers-borough/2016-04-05T08:55:06/income-of-tax-payers.csv

However, I downloaded the file from the website as I had trouble with this instruction.

The csv2json tool from the d3-dsv suite of tools is used to convert the CSV file to Json:

The –input-encoding flag indicates that the input file is Windows encoded. If this wasn’t done some characters (such as £ signs) would come out as question marks or other junk.

This means that csv2json knows to convert the file to Unicode, the modern standard for file encoding.

Next is to  split the Json apart into ND-Json with the same method used earlier.

$ ndjson-split ‘d’ \
< income-of-tax-payers.json \
> income-of-tax-payers.ndjson

Cat income-of-tax-payers.ndjson | less -S gives:

Join then transform

 

The next two steps are to join and to transform. The joining is merging the two sets of data together (london-4.ndjson and income-of-tax-payers.ndjson.

This uses ndjson-join, which merges the two files into one, based on the GSS codes. Looking at the new file [cat]  each line now has a two-element array on it — the first element is the geometry for a borough, the second is the income data.

This isn’t quite right, the income data needs to be inside the properties section, which is the space GeoJson gives to put the data which is not related to the geometry.

This uses ndjson-map to transform each line of the data.

The expressions d[0] and d[1] are the two datasets. Then the properties of the first dataset are set to include the GSS code from the geometries (which is given the name code), the name of the borough (name), and the 2013-14 medians from our income data (incomeMedian). It then returns the modified d[0].

Both rules need to be made before the next step.

Converting to TopoJson

The size of a geographical file can be a problem when displaying it on the internet, especially for mobile users.

The following command will show the size of london-6.ndjson:

$ ls -lh

This lists the flies in the source folder.= in ‘long’ (l) format. This shows the file size in human readable form (h), which is in megabytes/kilobytes instead of just bytes.

london-6.ndjson should be around 1.8 megabytes.

To reduce the size, the data will be converted to TopoJson.

Reducing file size

Further space can be saved by simplifying the geometries by discarding some of the data’s fine detail of our data as it is too small to be meaningful.

The following conversion will simplify borders without ending up with gaps or overlapping of the boundaries.

Toposimplify is from the topojson tools, the argument –planar-area 2 indicates the amount of map simplification  required.

The step is to quantize the data, which reduces the accuracy of the coordinates used in our geometries, which gives london-9.topo.json

$ ls -lh

Can be used to compare the file sizes of London-7, -8 and -9 with london-6.ndjson.

This uses another topojson tool, topoquantize. Picking a number is another tradeoff between accuracy and file size. In this case 1e3 is used. The only real way to work out the right number for any given dataset is by trial-and-error.

Confession

I had trouble with making london-6.ndjson. This meant that the final map (the next couple of stages) did not show the median income. I changed the column label in the csv file to ‘A’ and amending the rule by changing

‘d[0].properties = { code: d[0].properties.GSS_CODE, name: d[0].properties.NAME, incomeMedian: d[1][“Median £ – 2013-14”] }, d[0]’ \

to

‘d[0].properties = { code: d[0].properties.GSS_CODE, name: d[0].properties.NAME, incomeMedian: d[1][“A”] }, d[0]’ \

Building an interactive map

Again, I had trouble getting this to work. Apparently it isn’t possible in Windows and a server is needed. I did seriously think about buying server access just to finish this exercise.

I did ask for help using the Journocoders’ Slack pages, but no help came my way. As I mentioned, this exercise is about building up a network of people I can help and who can help me.

I had forgotten about the network of small business owners I had built up over the 12 years of running a business. I asked one contact to see if this interactive map can be run on a Windows machine, and she said that it is possible if Firefox is used.

So if you are a Windows user, make sure that you use Firefox to run the map.html file.

The data-visualisation library D3 will be used to build a simple interactive visualisation showing the map and the income data.

To do this a basic HTML page needs to be created.

This should be saved as map.html in your source directory, along with the following css and js files:

SVG: Scalable Vector Graphics. A format similar to HTML, but for producing vector graphics instead of web pages.

Vector: Graphics drawn using points and lines instead of pixels. This means they can be scaled up or down to any size. Images based on pixels are called raster graphics.

Map.js starts by creating a new function called visualise(), and ends with two event listeners. These event listeners tells the browser to call the function when the page first loads and whenever it’s resized.

The map is rendered inside the function, by loading in london-9.topo.json. A couple of variables are defined.

The target points at that empty <main> element in our HTML file. The sizeRatio reflects the rough ratio of London defined earlier. The map is also scaled down, which is what the scaleRatio specifies.

The height is based on the width and the scaleRatio. ScaleRatio is also used to compute a scaledWidth and scaledHeight for the graphic.

These numbers will modify the fixed size of the graphic (1000×800) based on the actual size of the browser (the width and height variables).

D3 only understands GeoJson, so the data needs to be converted.

Finally a new group is appended to the <svg>, and we tell it how it should be transformed based on the variables calculated.

Then it creates new <path> elements, which will contain the geometries for each of the boroughs. For each path three listeners are added:

  1. When the mouse moves over a borough, adds a new ‘selected’ class
  2. Removes that class when the mouse moves out of that borough
  3. Is triggered when the a borough is clicked. This extracts the name and income median value from the data, and then inserts them into the empty <h2> element in the HTML

Use Firefox to open map.html.

If you use Linux or OSX, then the full set of instructions can be found on the Journocoders’ Meetup page for the February meeting.

Publishing the code online

I did struggle with this one, free websites such as Google Sites and Wix would not allow the JavaScript. My tutor suggested using GitHub. I had already tried this, but I couldn’t get it to work. I finally realised that the .js file couldn’t find the London-9.topo.json file. Doh!

I still had problems when I was trying to add the code to GitHub. I was using Dropbox to host the .js, .css and London-9.topo.json files. These files were useable when running map.html from my hard drive.

But only the <body> part of the html code was working.

In the end I followed a YouTube video by Pear Crew. By creating a GitHub new repository online, and then cloning it to the GitHub desktop program it was so easy to create the webpage shown below (doesn’t work with IE 11).

So finally, I had the map online. It needs a bit of a facelift, but it works, which is something I thought would never happen.

The Pear Crew’s video is below the map.