Storytelling from Space: Tools/Resources

This list of resources is all about acquring and processing aerial imagery. It's generally broken up in three ways: how to go about this in Photoshop/GIMP, using command-line tools, or in GIS software, depending what's most comfortable to you. Often these tools can be used in conjunction with each other.

Note: formatting is a bit weird here. An identical, more legible version of this can be found here: http://bit.ly/mozfest-space

Acquiring Landsat & MODIS

Web Interface

Scripting

Processing Landsat

Background

Photoshop

GIMP

Scripting

GDAL - Geospatial Data Abstraction Library & ImageMagick (command-line photoshop)
Orfeo - More advanced. Do things like atmospheric correction and NDVI.

GIS

QGIS - Open Source alternative to ArcGIS
  • Install

    ``` brew tap homebrew/science
    brew install qgis24 --with-grass7 --with-orfeo

May need: <https://xquartz.macosforge.org/landing

May also need:

sudo pip install psycop2
echo 'export PYTHONPATH=$PYTHONPATH:/usr/local/lib/python2.7/site-packages' >> ~/.bash_profile
```

Creating Web Imagery

Storytelling Tools

Storytelling from Space, In the wild

Misc Geospatial Tools

Bonus


Querying the Sum of All Human Knowledge - Resources

The official Wikipedia API is limited in terms of data extraction. You can request page content and metadata but there's no great way to query for data across pages or relationships between pages, and Wikipedia "categories" can become stale if they're not updated manually.

This is where semantic web projects come in. DBPedia and Freebase regularly extract data from Wikipedia and give it structure, essentially turning Wikipedia content and related resources into a giant queryable database. Below is a list of resources to extract data from Wikipedia via semantic web sources, including DBPedia (using SPARQL) and Freebase (using MQL).

Before doing queries, it's first important to characterize Wikipedia in terms of contributor demographics and topic distribution in order to understand its strengths as a datasource. Very generally, it's contributor base is skewed towards Western english-speaking males.

My feelings on how to approach this: Freebase (using MQL) is a bit easier to learn than SPARQL, a good way to get introduced to semantic web concepts. But SPARQL has become a standard, and is the official language of the semantic web and is here to stay. I'm uneasy about the future and stability of Freebase, given that it's now owned by Google. It's also worth looking at Wikidata, a Wikimedia project that that is attempting to make parts of Wikipedia more data-driven across languages and lists. However, it's a younger project than the Freebase and DBPedia and doesn't yet have an advanced query API.

I talked about all this at CSVConf 2014 in Berlin, video here:
https://www.youtube.com/watch?v=NhhJmgXKSJI

Start querying (also scroll down for more query examples!)

Wikipedia stats

Wikipedia info

Wikipedia API

Freebase resources (also see MQL queries below)

Freebase datasources

[{
  "id": null,
  "name": null,
  "type": "/dataworld/mass_data_operation",
  "timestamp": null,
  "sort": "-timestamp",
  "a:operator": [{
    "id": "/m/0qs4g2b",
    "optional": "forbidden"
  }],
  "b:operator": [{
    "id": "/m/0j3vmzv",
    "optional": "forbidden"
  }],
  "c:operator": [{
    "id": "/m/0j_5_4x",
    "optional": "forbidden"
  }],
  "d:operator": [{
    "id": "/m/021y7rx",
    "optional": "forbidden"
  }]
}]

DBPedia resources

SPARQL Endpoints:

SPARQL Resources (also see SPARQL queries below)

Wikidata resources

Linked data in the wild

Misc

Queries

MQL queries - Paste into https://www.freebase.com/query

  • CAST OF ALL 1990s ROBERT DENIRO AND JOE PESCI MOVIES DIRECTED BY MARTIN SCORSESE, WITH THE NAME IN SPANISH IF POSSIBLE
  [{
    "a:starring": [{
      "actor": "Robert de Niro"
    }],
    "b:starring": [{
      "actor": "Joe Pesci"
    }],
    "c:starring": [{
      "actor": null
    }],
    "directed_by": [{
      "name": "Martin Scorsese"
    }],
    "initial_release_date": null,
    "initial_release_date>=": "1990",
    "initial_release_date<=": "1999",
    "a:name": {
      "value": null,
      "lang": "/lang/es",
      "optional": true
    },
    "b:name": null,
    "mid": null,
    "type": "/film/film"
  }]
  • Male american actors born 1943
[{
  "id": null,
  "name": "actor",
  "type": "/people/profession",
  "people_with_this_profession": [{
    "name": null,
    "/people/person/nationality": [{
      "id": "/en/united_states"
    }],
    "/people/person/gender": [{
      "id": "/en/male"
    }],
    "/people/person/date_of_birth>=": "1943",
    "/people/person/date_of_birth<": "1944",
    "/people/person/date_of_birth": null
  }]
}]
  • DeNiro's awards
[{
  "id": null,
  "name": "actor",
  "type": "/people/profession",
  "people_with_this_profession": [{
    "name": "Robert De Niro",
    "/award/award_winner/awards_won": [{
      "*": null
    }]
  }]
}]
  • Disasters
[{
  "id": null,
  "name": null,
  "type": "/event/disaster",
  "fatalities": {
    "value>": 10000,
    "value": null
  },
  "sort": "-fatalities.value"
}]

SPARQL queries - Can be pasted into SPARQL or SNORQL endpoints

  • 100 De Niro movies
PREFIX : <http://dbpedia.org/resource/>  
PREFIX dbprop: <http://dbpedia.org/property/>

SELECT ?movie  
WHERE {  
     ?movie dbprop:starring :Robert_De_Niro}
ORDER BY ?resource  
LIMIT 100  
  • Geo query
SELECT ?subject ?label ?lat ?long WHERE {  
?subject owl:sameAs <http://dbpedia.org/resource/Eiffel_Tower> .
<http://dbpedia.org/resource/Eiffel_Tower> geo:lat ?lat.  
<http://dbpedia.org/resource/Eiffel_Tower> geo:long ?long.  
<http://dbpedia.org/resource/Eiffel_Tower> rdfs:label ?label . }  
  • Counting geo query results
SELECT (COUNT(*) AS ?count)  
  WHERE{
    ?place rdf:type dbpedia-owl:Place .
    ?place foaf:name ?title .
    ?place geo:lat ?geolat .
    ?place geo:long ?geolong .
  }
  • Landlocked european countries with population filter
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>  
PREFIX type: <http://dbpedia.org/class/yago/>  
PREFIX prop: <http://dbpedia.org/property/>  
SELECT ?country_name ?population  
WHERE {  
    ?country a type:LandlockedCountries ;
             rdfs:label ?country_name ;
             prop:populationEstimate ?population .
    FILTER (?population > 15000000 && LANG(?country_name)='en') .
}

  • NASA ships
SELECT ?p ?o WHERE  
{ 
  <http://nasa.dataincubator.org/spacecraft/1968-089A> ?p ?o
}
  • Movies, english descriptions
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>  
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>  
PREFIX foaf: <http://xmlns.com/foaf/0.1/>  
SELECT ?film ?description ?film_name WHERE {  
    ?film rdf:type <http://dbpedia.org/ontology/Film>.
    ?film foaf:name ?film_name.
    ?film rdfs:comment ?description .
    FILTER (LANG(?description)='en')
}
LIMIT 100  
  • Alan Alda properties
SELECT ?property ?hasValue ?isValueOf  
WHERE {  
  { <http://dbpedia.org/resource/Alan_Alda> ?property ?hasValue }
  UNION
  { ?isValueOf ?property <http://dbpedia.org/resource/Alan_Alda> }
}
  • Counting in SPARQL
(COUNT(DISTINCT ?instance) AS ?count)
  • Space missions
PREFIX dbowl: <http://dbpedia.org/ontology/>  
PREFIX dbpprop: <http://dbpedia.org/property/>  
PREFIX dbres: <http://dbpedia.org/resource/>

SELECT ?y ?z WHERE {  
 ?z a dbowl:SpaceMission 
 }

limit 10  
  • European countries
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>  
PREFIX yago: <http://dbpedia.org/class/yago/>  
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>

SELECT ?place WHERE {  
    ?place rdf:type yago:EuropeanCountries .
    ?place rdf:type dbpedia-owl:Country
}
  • Pablo Picasso painting locations (factforge)
# Cities where paintings of Picasso are located (PROTON)
PREFIX dbpedia: <http://dbpedia.org/resource/>  
PREFIX ff: <http://factforge.net/>  
PREFIX ptop: <http://www.ontotext.com/proton/protontop#>  
PREFIX pext: <http://www.ontotext.com/proton/protonext#>

SELECT DISTINCT  ?painting_l ?owner_l ?city_l  
WHERE {  
     ?painting pext:authorOf dbpedia:Pablo_Picasso.
     ?painting ptop:isOwnedBy ?owner ; ff:preferredLabel ?painting_l.
     ?owner ff:preferredLabel ?owner_l .
     ?owner ptop:locatedIn ?city.
     ?city a pext:City ; ff:preferredLabel ?city_l .
     OPTIONAL { ?city ptop:subRegionOf ?otherLoc. ?owner ptop:locatedIn
?otherLoc. ?otherLoc a pext:City }
     FILTER (!bound(?otherLoc))
}

John Cusack on becoming a 2015 Knight-Mozilla Fellow

"I don't want to sell anything, buy anything, or process anything as a career. I don't want to sell anything bought or processed, or buy anything sold or processed, or process anything sold, bought, or processed, or repair anything sold, bought, or processed. You know, as a career, I don't want to do that." ~John Cusack in Say Anything

In this timeless moment of modern cinema, John Cusack then goes on to explain that kickboxing is his current career choice, which I think is a really good choice. But if he were more into computers, graphics, data or code, I think there's a better option for him: he could have been a Knight-Mozilla Fellow. Right now, I'm like a cyber John Cusack, and I'm holding up my boombox.

More of a proverbial John Cusack holding a proverbial boombox. Credit: Internet.

I don't have a journalism background. I've designed, developed, and visualized for companies and academia. I wasn't entirely disenchanted with my previous jobs, but I increasingly wanted to make things for the sake of knowledge and people, not for a client or a sponsor's agenda. I wanted to make things more relevant to humanity, not to an industry. I wanted to focus on science, the environment and our intersection with it, and let people explore data and concepts in visual and interactive ways. I couldn't help but notice all the amazing interactive work the New York Times was releasing, and the fellowship seemed like a way in, with freedom to explore the topics I care about. So I applied two years in a row, first from Philly to no avail, and then from Singapore to maximum avail. Now I'm at ProPublica in New York over half way through the placement.

2014 applicant map

2014 applicants. I'm the southern most dot in Asia. Credit: Open News

I'm working on a project that involves visualizing NASA data, integrating with repositories of satellite imagery, processing it in Photoshop, in the command-line, making it interactive in a news application, helping to create what I hope will be something really beautiful and worthwhile to explore. Working with data from space is basically the coolest thing I could be doing right now. Did I expect to be doing this? Not really. All I did was follow my interests, because I have less of a job description and more of a general mandate to work with incredibly smart people and make interesting things.

Earth is my favorite. Credit: NASA

Along the way another unexpected thing happened. Part of all this is being able to travel around the world, going to conferences and workshops, and I was accepted to talk at some. Preparing for and talking in front of crowds is really hard for me, but I didn't realize how much I'd appreciate the outcome. It inspires people to come over and talk about things they're also interested in, and I've met a lot of people I never would have otherwise. I just got back from the csv,conf and Open Knowledge Festival in Berlin and feel totally energized.

About to talk about my second favorite thing, Wikipedia. Credit: @gaba

You don't have to be a news junkie or feel like a hotshot to become one of next year's fellows. If you like making digital things, want a change of pace from where you're at, and are at all curious about what goes on in newsrooms around the world, don't hesitate to apply. You have until August 16th.

Learn more: http://opennews.org/fellowships
Apply here: http://opennews.org/fellowships/apply.html


2014 Civic Media Link Dump

Links and projects overheard at the 2014 MIT-Knight Civic Media conference


10,000 features

There are plenty of reasons not to animate 10,000 things simultaneously. It can be a lot to process, with our minds and with our browsers. Maybe a topic would be better expressed more simply, filtered and aggregated and streamlined. But maybe the point is the chaos. Maybe it's not about creating a tightly designed narrative but about running a simulation. Sometimes a dataset's complexity is the whole point, and that patterns only emerge by seeing larger-scale dynamics, like in cities or galaxies. There are plenty of reasons to animate 10,000, 100,000, millions and billions of stars.

In web-browsers, options are limited. Either we can pre-render a video and serve that up, or use WebGL for interactivity. Throw more than ~1,000 animated features into an SVG or the DOM and framerate will dip really quickly. And upwards of 5000 features in a 2D Canvas seems to be the endgame. There are a lot of casual benchmarks out there measuring this phenomenon.

When considering urban data and geospatial libraries, 10,000 features eliminates using much of Leaflet and D3's native functionality. Meanwhile there are demos out there that show 1 or 2 million particles animating independently. Theres a huge gap there. But a specialized JS library and a WebGL tech demo are very different beasts. It takes a special kind of coder to embrace the shader code to achieve that level of performance, and an even more motivated one to translate that into usefulness, in a reusable library. Beyond shader code, even the more straightforward WebGL API is hard to navigate. Still difficult: Three.js, the most popular WebGL library that even further abstracts away from WebGL code.

Interactive web visualization didn't really catch on until D3 came along. Raphael has been around for a while, and still does a great job at making SVGs accessible, but as a general purpose shape library, it's less opinionated about visualization and handling data than D3. Three.js is also really useful, but it's akin to Raphael, essentially a shape/texture/scene library. It's less approachable to someone who wants a quick data-driven win, and it doesn't have practical examples for data or geospatial visualization. But there are projects are trying to bridge that gap.

In a geospatial context, there are some maturing options out there that render WebGL. I had the indistinct pleasure of using CesiumJS at the Senseable City Lab. We needed all the fixins of a "traditional" slippy map, which includes an endlessly zooming global base map, and a mechanism to animate data over time and space. It provides a lot out of the box, as a spatiotemporal 3D globe library, seemingly geared towards aerospace visualization. It's powerful but it's less than elegant. Its default controls are downright ugly and they're not so straightforward to de-style. At high zoom levels it suffers from some tile rendering artifacts, sometimes getting stuck between zoom levels and not loading in a smooth tileset. There's also aspects to it that are frustrating to override. For instance, tweaking the appearance of the harsh earth's blue atmosphere requires hacking into vertex shader code. That said, we were successfully animating 5-10000 simple geographic and interactive features, such as scaled circles and arcs over bus stop locations, representing ridership. Cesium is a project that could use some design and documentation improvements but it's going somewhere.

Not to be ignored is the long awaited OpenLayers 3, and its WebGL renderer. It will support vector tile layers, vastly expanding the possibilities of web cartography. But as of Aug 2013, its WebGL visualization layer was not at all ready for primetime. It felt more like a hack than a feature. I have hope. Mapbox also talks about its nextgen vector map renderer, but suspiciously absent from the discussion are its visualization tools beyond the base map.

Then there's Vizicities. They have impressive videos and one nice demo. It's a specialized geospatial library geared towards urban visualization and all the overhead that that assumes, with streets and building models. They've recently open sourced their code, but it all seems a bit in its infancy.

More geared towards visualization is PhiloGL. It's more catered to a data-driven crowd than threejs but unfortunately it doesn't look like its gotten much attention in the past few years...

Also of note, is a WebGL renderer for D3, called PathGL, which I haven't explored deeply.

WebGL is something that's easy to hype as the future of the web. It's sexy looking, and powerful, has increasing browser adoption, but it still really hasn't caught on beyond the realm of the experts. It's powerful because it's relatively low level. But low-level is the opposite of accessible, and the killer library hasn't landed yet. So there's an opportunity there. All of this is an area I'm looking to explore more in the coming months. Time to get an education.


The State of The State of the Map

People summarize OpenStreetMap (OSM) as the Wikipedia of maps. It's an open, freely editable, crowdsourced compendium of global knowledge. But less broadly these are some very different projects, as I just learned at The State of the Map in DC, the annual US OpenStreetMap conference. In 2012 I also got a great slice of Wikipedia and its eclectic contributor community at the 2012 Wikimania conference.

If OpenStreetMap looks like a map, it's because it is a map. But thats just the the final output. People talk about it in a lot of ways:

As a datasource

David Blackman from Foursquare talked about a lot, but included a bit on Quattroshapes, the fourth iteration (see Alphashapes, Betashapes, Zetashapes) of international boundaries (like neighborhoods) derived from a lot sources including OSM, Flickr, and gov't boundaries. Useful for something like Foursquare that needs more subjective boundaries on relevant geographic searches.

Kevin Webb used (https://twitter.com/kvnweb) OpenTripPlanner/OpenTripAnalyst within a really impressive demo. It used OSM road and transit networks combined with GTFS data to visualize travel distance according to various types of job availability.

And then everyone and their mother has a faster geocoder than you, based on OSM data, but because there isn't enough address data in the database, pure OSM geocoders are still only reliable for use in a few cities like Chicago and New York, whose open data policies vibe well with OSM's license, to allow bulk address imports. Something to that effect. The licensing issue was on a handful of presenters minds there. Alex from Mapbox can explain his rationale for a license change, although I'd feel more comfortable about his argument if it weren't coming from the fast-growing commercial player that is Mapbox.

As a platform

A bunch of interesting projects that use OSM for data collection. The National Park Service (NPS) uses OSM in their own instance of the really nicely designed iD OSM editor, to accept points of interest (POIs) within parks, for entry into an internal NPS database, to be cleaned and validated and eventually pushed back out to OSM. All part of an open data policy, improving OSM and NPS data simultaneously.

As context

Mapbox and Stamen kind of dominated this conference in the design department, demoing maps for use in a ton of specialized visualization contexts. Really nice multi-scale cartography with Tilemill and CartoCSS, and a glimpse into a WebGL vector-tile future. Also some nice entries from the LA Times data desk with Quiet LA and Silent LA.

Overall there was a bit of an uncomfortable overlap between the sponsors, the exhibitors, and the presenters considering that OSM owes its existence to volunteers, although it speaks to its increasing adoption and evolution spurred by commercial players. What I enjoyed from Wikimania was the heavy presence from the strongly opinionated community itself. There, there were probably more contributer talks than technology talks. But this also speaks to some fundamental differences between Wikipedia and OpenStreetmap.

As collaborative database

There was talk of OSM not as a map but as a collaborative database, and I think this where OSM and Wikipedia differ the most. OSM is made of up points, lines, polygons and networks, annotated into a fluid but somewhat agreed upon taxonomy. All of this information can be queried, filtered, selected and aggregated because it's stored and recorded consistently (see the Overpass API for one means of limited extraction). In this respect, OSM has more similarities to WikiData, DBPedia, or Freebase than to Wikipedia itself. These are structured versions of Wikipedia, generated from regular extracts to apply queryable and semantic meaning to information within article pages. OSM data alone, for a lay audience, can't really be appreciated. It needs to be processed into a map, or extracted in a meaningful way. But Wikipedia's primary purpose is to provide the sum of knowledge for human consumption first and foremost. Which is to say there's no middle computational step that requires entering information in a machine-readable way. Wikipedia is a collaborative knowledge base, whereas OSM is a collaborative database whose data can be rendered, in turn representing a visual collective knowledge.

These fundamental differences manifest themselves in their respective conferences and their active user bases. State of the Map was an awesome map-nerd techfest. Totally worthwhile for a guy like me. Wikimania was also really impressive, at times esoteric, had some seriously technical and design aspects, but can appeal on the whole a less technically oriented crowd.

This all said, one thing that came up a lot at the State of the Map was iD, the easier-to-use editor for adding features to and editing OSM. I tried it out today and there was virtually zero learning curve. It's really well designed, and definitely appeals to a lay Sim City audience. I drew a few features missing from my Long Island elementary school and used Maproulette to find some random things to fix up. Give it a whirl: http://www.openstreetmap.org/edit?editor=id

BTW, all videos here from The State of the Map are available here:
http://wiki.openstreetmap.org/wiki/State_Of_The_Map_U.S._2014/Video_recordings


MrSID / FileGDB support in GDAL and QGIS with Homebrew

There is a geographic raster format and his name is MrSID. And there's an ESRI vector format called the File Geodatabase (FileGDB). These are both proprietary, closed formats that don't have native support within the most powerful open-source geospatial tools, GDAL and QGIS. To add support for these formats, I needed to compile GDAL with the right options. With support for these formats in GDAL, I was then able to get the data into QGIS or PostGIS without the need for ArcGIS to act as a converter.

This is how I did it with with Homebrew on OSX.

  1. Make sure Homebrew is up to date
    brew update

  2. Install GDAL with these flags:
    brew install gdal --enable-unsupported --complete --with-postgres
    (I needed to use brew reinstall as I already had GDAL on my system)

  3. Download the SDKs provided by LizardTech (MrSID creator), and ESRI that the GDAL plugins will use, in to this directory: /Library/Caches/Homebrew.

  4. Get the GDAL plugin kegs for Homebrew from the OSGeo4Mac tap:
    brew tap dakcarto/osgeo4mac
    brew install gdal-mrsid
    brew install gdal-filegdb

  5. Add this line to ~/.bash_profile and restart the session:
    export GDAL_DRIVER_PATH=/usr/local/lib/gdalplugins

  6. Ensure everything is ok:

    • Being a raster format, do this for MrSID:
      gdalinfo --formats
      Look for:
      MrSID (rov): Multi-resolution Seamless Image Database (MrSID)
    • As a vector, check for FileGDB support with this:
      ogrinfo --formats
      Look for:
      "FileGDB" (read/write)

Success? Now I can...

  • Convert MrSID to GeoTIFF:
    gdal_translate -of GTiff input.sid output.tif

  • FileGDB to PostGIS:

    ogr2ogr -overwrite -skipfailures -f "PostgreSQL" PG:"host=myhost user=myuser dbname=mydb password=mypass" "/somefolder/BigFileGDB.gdb" "MyFeatureClass"
    
  • FileGDB to a shapefile (this will truncate field names to 10 characters, and field values to 255 characters)

    ogr2ogr -f "ESRI Shapefile" <out_directory> /path/to/your/database.gdb
    

Related awesome link: GDAL cheatsheet https://github.com/dwtkns/gdal-cheat-sheet


Interacting with Academia

This week I recovered just in time from the flu to talk at the "Going Spatial" workshop at the Harvard Center for GIS, largely to academic folks in history and social sciences.

The goal here was try to inspire the crowd to pull inspiration from examples in the modern interactive and visualization scene, from the digital journalism world and beyond, with so much interesting work being constantly published. I see the goals of news orgs and academia as not so incompatible. Academia tends to produce work for peer review (not a lay audience) but researchers often want to engage with the public to increase awareness of the importance of their research. The problem is that it can be a struggle to strike the balance between doing justice to the complexity of research vs being comprehensible to a public unfamiliar with the seemingly esoteric focus that can often accompany academic research.

But comprehensibility is exactly what the best news graphics and interactives achieve. Topics are distilled, annotated, and given enough context to be relevant to readers in language meant not to intimidate. And it's done for the sake of expressing knowledge for information's sake. So I think it's a good idea for academics to pay attention to the creativity and conventions of expression in digital news, and reap the benefits of all the tools that are being used and developed for more rapid visualization and more fluid interactivity.

A lot has been written about what makes a good visualization or interactive (Alberto Cairo's The Functional Art covers this very well), and during the workshop I did a quick overview of what contributes to a quality app. I did this by way of example, links to interactives highlighted here (http://bit.ly/webspatial). The common thread in the examples is that all of these are geospatial in some form or another, in line with the theme of the workshop.

So what is it that makes an effective interactive? Here's 5 things that can be found in effective visualizations (not necessarily, but occasionally, simultaneously):

Beauty

This can be a loaded term. Beauty can be subjective, but many designers and cartographers agree on certain guiding principles--theory on color, white space, typography, scale relationships, etc. And it can be learned, and even a cursory appreciation of the principles can be enlightening. Edward Tufte books are a great place to start.

Dynamic Facets

The best interactives aren't necessarily just a nice looking map. Often you'll have a map visualization accompanied by a chart, both of which represent different facets of the data. Interacting with the chart highlights equivalent information in the map, and interface controls filter and highlight information dynamically within each linked visualization. Each visualization enhances one another, the whole being greater than the sum of its parts.

Annotations

Alberto Cairo would call this a presentational layer. Rather than just throw a someone at a map to explore, curate a few areas of interest to guide the user through important aspects of the topic. It's also important to think about context. Is a user going to appreciate a topic without placing it in the bigger picture of space and time?

Near and Far Views

This is something of a guiding principle at ProPublica, related to the previous point. Summarize a topic from a higher level, providing overview information at a glance, as a far view (the forest), then allow the user a mechanism to drill into the data or visualization, and how the topic is personally relevant in a near view (the trees). For instance, show a regional overview of patterns of a geographic area then have an address search to see patterns on your block.

Not a map

Even though a lot of data might be geographic, it doesn't mean a map is the best choice for visualization. Think about if there are actual spatial patterns to worth highlighting. Maybe it makes sense to sort places ranked by non-spatial criteria to highlight other relevant patterns.

~~~~~

So what I got out of this workshop: academics are definitely interested in this stuff, but there can be huge technical hurdles to overcome to even start thinking in terms of translating work into an interactive format. Beyond concepts, I need to be more specific about what tools and resources might translate over well from the data journalism and design/development world...


Upgrading Ghost

This blog is powered by Ghost, the charmingly simple markdown-powered blog platform, still in active development, missing a lot of basic features.

Been happy with using it so far but I managed to knock my site out today by running an upgrade.

In updating Node modules, it took a while to realize npm install --production was failing silently on sqlite modules. Had to run npm install sqlite3 --build-from-source manually and deal with my server resetting midway.

Others have had this kind of problem: http://docs.ghost.org/installation/troubleshooting/

Also found a nice update script in the process:
http://www.allaboutghost.com/how-to-update-to-ghost-0-4-2/

Now that I'm upgraded, what did I get? Whole lotta tags. Yeah real tag functionality was lacking until now. In a blog platform. I'm like so bleeding edge.


Memories of mid-march

Started doing some research for a new project with Al Shaw. Work has included so far: a crash course in QGIS to explore historical data and big hydrography shapefiles. Then getting the files into a Postgres/PostGIS database and rendering them as tiles with ProPublica's SimplerTiles. Loving the GIS stuff.

I'm also exploring climate change financial data. I imported records into a few relational tables in a Rails app, then back out in to the browser, lookin' good. Tried to visualize it in a D3 Sankey but was really unhappy with how inflexible the plugin was. Made a few modifications that might be useful to others. Moving forward with the viz.

Coming up: I'm talking at a spatial analysis workshop at W.E.B. DuBois The Hutchins Center on March 31st. Details here: http://hutchinscenter.fas.harvard.edu/events-lectures/events/march-31-2014-230pm/going-spatial-fellows-workshop-erika-kitzmiller

Bonus: Sitting at my desk, looking out across the East River, I witnessed a slowly rotating bird vortex reaching into the sky. Basically, time is a flat circle.


Adding flexibility to the D3 Sankey Plugin

Getting data into the D3 sankey plugin is frustrating. It requires a specific data structure that isn't so straightforward to achieve or debug. It also requires specific field names within the data. I added a "schema" function to the plugin that accepts arbitary field names and allows making reference to a particular id key within nodes. Find the updated plugin here.

The state of affairs

Data structure currently required by the original plugin:

{"nodes":[
  {"name":"Agricultural 'waste'"},
  {"name":"Bio-conversion"},
  {"name":"Liquid"},
  {"name":"Losses"},
  {"name":"Solid"},
  {"name":"Gas"},
  {"name":"Biofuel imports"}
],
"links": [
  {"source":0,"target":1,"value":124.729},
  {"source":1,"target":2,"value":0.597},
  {"source":1,"target":3,"value":26.862},
  {"source":1,"target":4,"value":280.322},
  {"source":1,"target":5,"value":81.144},
  {"source":6,"target":2,"value":35}
]}

It's array of nodes and and array of links. "source" and "target" values within links make references to specific nodes: "source":6 is referencing the 7th node in the node array. So, node IDs are implied by their order in the array.

Note the names of these keys: source, target, value. These are non-negotiable. The data needs to be transformed to use these specific names for the plugin to function.

A new plugin function

Using the node array order to provide IDs keeps the data compact, but it forces extra data transformation steps to create IDs that make reference to the order established. Databases are already going to have data with unique ids to make reference to.

So, I altered the plugin to be a bit more flexible with data formats.

I added a new option called "schema" that solves both of these annoyances. It allows making reference to specific node id fields (assuming they're unique), and allows arbitrary field names for link items. Consider this sample json:

{"nodes":[
  {"type_id": 12, "name":"Agricultural 'waste'"},
  {"type_id": 16, "name":"Bio-conversion"},
  {"type_id": 17, "name":"Liquid"},
  {"type_id": 10, "name":"Losses"},
  {"type_id": 19, "name":"Solid"},
  {"type_id": 13, "name":"Gas"},
  {"type_id": 14, "name":"Biofuel imports"}
],
"links": [
  {"fuel_origin":12,"fuel_dest":16,"val":124.729},
  {"fuel_origin":16,"fuel_dest":17,"val":0.597},
  {"fuel_origin":16,"fuel_dest":10,"val":26.862},
  {"fuel_origin":16,"fuel_dest":19,"val":280.322},
  {"fuel_origin":16,"fuel_dest":13,"val":81.144},
  {"fuel_origin":14,"fuel_dest":17,"val":35}
]}

Still nodes and links arrays but each node has a unique id: type_id, and each link has a fuel_origin and fuel_dest. These can be named anything—maybe they're the field names that come out of a csv or a database.

Then declare the sankey as normal:

    d3.sankey()
      .size([width, height])
      .nodeWidth(15)
      .nodePadding(10)
      .nodes(energy.nodes)
      .links(energy.links)
      .layout(32);
      .schema({
          id: "type_id",
          source: "fuel_origin",
          target: "fuel_dest",
          value: "val"
      });      

New [optional] option! .schema({}) accepts 4 keys, and they all reference field names. (If this is not provided, Bostock's original data structure will work)

  • id: the name of the node id field. default: "id"
  • source: the name of the field to be used as the link source. default: "source"
  • target: the name of the field to be used as the link target. default: "target"
  • value: the name of the field to be calculated for node size. default: "value"

I'm using the ".filter()" function in javascript to make this work, which means only IE9+ support. But such is D3.

Repo: https://github.com/briantjacobs/d3-plugins/tree/master/sankey


MATLAB and Google Styled Maps

For more options in plotting geospatial data in Matlab, I gently modified the plot_google_map function to accept styling parameters for street maps. Download here or check out the Gist.

There aren't a lot of great options for plotting geospatial data atop a basemap directly in Matlab. You can export data for import into a GIS package like ArcGIS or QGIS, but you may want to be able to work entirely within Matlab, for purposes of automation or ease of image regeneration.

There’s a function available that does a great job of generating a Google basemap called plot_google_map. It pulls in a corresponding street or aerial map according to your already plotted data (assuming WGS84) using the Google Static Maps API.

Google's default map styles can leave something to be desired though—they're not really one-size-fits-all. The streets map in particular can be more obtrusive than helpful as a context, with its yellow tints and aggressive labelling.

Enter Google Styled Maps. You can add an additional set of paramters to style google street maps. It's pretty flexible and it works with the Static Maps API. Which means that with a few edits, the plot_google_map function can accept styling paramters.

Tooling around with the Styled Map Wizard can give you a sense of what's possible, and generate the code you need to plug into a plot. I actually strongly recommend you use this modified version of the wizard. The only difference is that it offers an "Edit JSON" button as opposed to "Show JSON." This allows you to restore a style you've created previously.

If you go to the modified wizard, hit "Edit JSON" and paste this in, and you'll see a desaturated basemap with fewer labels.

[
  {
    "stylers": [
      { "saturation": -100 }
    ]
  },{
    "featureType": "transit",
    "stylers": [
      { "visibility": "off" }
    ]
  },{
    "featureType": "road",
    "stylers": [
      { "visibility": "simplified" }
    ]
  },{
    "featureType": "road.highway",
    "stylers": [
      { "visibility": "off" }
    ]
  },{
    "featureType": "water",
    "stylers": [
      { "lightness": -6 }
    ]
  },{
    "elementType": "labels.text",
    "stylers": [
      { "weight": 0.1 },
      { "gamma": 4.25 }
    ]
  },{
    "featureType": "water",
    "elementType": "labels",
    "stylers": [
      { "visibility": "off" }
    ]
  },{
    "featureType": "administrative.locality",
    "stylers": [
      { "visibility": "off" }
    ]
  }
]

So, how to get that map style into Matlab? Go back to the wizard and hit the "static map" button. Take a look at that link below the map. It has the parameterized version of that JSON that you can plug into the Matlab script. Find the first &style= and everything beyond it:

&style=saturation:-100&style=feature:transit|visibility:off&style=feature:road|visibility:simplified&style=feature:road.highway|visibility:off&style=feature:water|lightness:-6&style=element:labels.text|weight:0.1|gamma:4.25&style=feature:water|element:labels|visibility:off&style=feature:administrative.locality|visibility:off

After you add the my modified plot_google_map function to your Matlab project, you can run this script to plug the styling settings into the function.

...which turns this...

...into this...

Find some more examples here.