Visualizing Hacker News stories using Elasticsearch & Kibana

Introduction

Over at Hacker News items such as stories (but also comments, jobs, polls ) are added every second.

A story is represented by a title and a URL. A users karma (no. points) are roughly, the number of upvotes on their stories and comments minus the number of downvotes.

I wanted a better way of displaying the popularity of a Hacker News story.

I chose Elasticsearch to index the data as it’s dead easy to setup and use. Alongside Kibana to visualize the data, I had a working solution up and running in less than a day.

HN API

Hacker News provides a RESTful API which makes data available in near real time.

The base URI for the API is https://hacker-news.firebaseio.com/v0

Endpoint	Description
`/topstories.json?print=pretty`	Returns top stories (~500)
`/newstories.json?print=pretty`	Returns new stories (~500)
`/item/\<id>.json?print=pretty`	Returns story information

New stories

Making a request for the new stories again returns an array of story IDs.

Story information

Here we can get more information regarding a particular story posted on HN.

The URI can be built up using the top/new story ID.

e.g. https://hacker-news.firebaseio.com/v0/item/22813908.json?print=pretty

This returns the following JSON.

  {
    "by" : "shortj",
    "descendants" : 47,
    "id" : 22813908,
    "kids" : [ 22815121, 22815383, 22814200, 22815940, 22815951, 22815605, 22815999, 22815381, 22814987, 22815964, 22815312, 22815889, 22816003 ],
    "score" : 114,
    "time" : 1586360523,
    "title" : "Rules for Data Modeling with DynamoDB",
    "type" : "story",
    "url" : "https://www.trek10.com/blog/the-ten-rules-for-data-modeling-with-dynamodb"
  }

More information can be found at https://github.com/HackerNews/API

Docker containers

Both Elasticsearch and Kibana are setup within docker containers

docker-compose file

version: "3.7"
networks:
    default:
      driver: bridge
      name: devops

services:
  elasticsearch:
    restart: always
    image: elasticsearch:7.6.2
    ports:
      - "9200:9200"
      - "9300:9300"
    environment:
      discovery.type: single-node
  
    container_name: elasticsearch
  
  kibana:
    restart: always
    image: kibana:7.6.2
    ports:
      - "5601:5601"
    container_name: kibana

Elasticsearch docker container

Using a single node discovery type means that there is only 1 master here and this will not join a cluster with another node.

This serves the purpose for development/testing.

Elasticsearch APIs

Create index API

Adds a new index ‘hn-new’ into a Elasticsearch cluster. This index will be used for all HN New stories.

curl -X PUT "<el-host>:9200/hn-new/"

Index mappings

The mappings allows fields inside an index.

title = HN story title - type = keyword
url = the link to where the story resides on the web - type = keyword
score = the no. points currently accumulated - type = integer
id = the HN story id - type = integer
added = the timestamp on when the HN story was posted - type = keyword

The keyword type ensures we can filter on those fields.

curl -X PUT "<el-host>/hn-new/_mapping/" -H 'Content-Type: application/json' -d'
    {
    
        "properties" : {
            "title" : { "type": "keyword" },
            "url": { "type" : "keyword" },
            "score": { "type" : "integer" },
            "id": { "type" : "integer" },
            "added": { "type" : "keyword" }
          }
}
'

Document API

Use this API to add a JSON document to the specified index and also makes it searchable.

PUT /<index>/_doc/<_id>

The <_id> should be a unique identifier and for this I used the HN story ID - since each story ID is unique.

JSON document

With Python I made a HTTP PUT request with a JSON data payload. This payload consisted of.

  {
      "title":"Google’s former CEO hopes coronavirus makes people more “grateful” for Big Tech",
      "url":"https://www.vox.com/recode/2020/4/14/21221141/coronavirus-eric-schmidt-google-big-tech-grateful",
      "score":1,
      "id":22876051,
      "added":"2020-04-15 09:57:24"
  }