A movie news aggregator using Python, Flask & Bootstrap

Introduction

I like using Python a lot, and I like reading about movie news too.

I developed my own RSS aggregator so that instead of visiting multiple sites daily to get my movie news I go to 1.

Sure, I could have used an existing RSS aggregator service, but I wanted to understand how I would develop one myself.

In this post I’ll go over the steps on how I made my own application.

RSS Feed

Most websites/blogs offer an RSS feed to keep a track of changes made to web content. It is relatively easy to identify as it has a distinguishable icon. Depending on how the content creator has setup the feed, it often will display an XML encoded page.

On some sites it’s hidden. So you may need to go into the html source code of the particular site to get access to the RSS feed. In this case just look for ‘rss’ in the source.

<link rel="alternate" type="application/rss+xml" title="RSS Feed" href="//url_to_feed/" />

For the movie news I only used the feeds from 3 main websites I visit.

  1. Slashfilm - https://feeds2.feedburner.com/slashfilm
  2. Collider - http://collider.com/feed
  3. Indiewire - https://www.indiewire.com/feed

A typical RSS feed has a lot of XML tags and elements. The most important ones are shown below.

<item>
    <title>title of news item goes here</title>
    <link>https://<link_to_news_item></link>
    <pubDate>Fri, 27 Sep 2019 18:20:50 +0000</pubDate>
    <description><short_overview_of_news_item></description>
</item>
  • opening tag to identify a news item
  • provides a link to where the news item was initially published
  • timestamp on publication date
  • often a small paragraph taken from the original news item

Using Python to process an RSS feed

As I mentioned before the RSS feed will be in an XML format.

In the example below I make use of the requests module to make a HTTP GET request call using the RSS feed URL.

try:
    response = requests.get(feed_url, headers=headers, verify=False)
    if (response.status_code == 200):
        feed_data = response.text
        if (feed_data is not None):
            root = ElementTree.fromstring(feed_data)
            for it in root.iter('item'):
                for x in it:
                if (x.tag == 'title'):
                    # do something..
                if (x.tag == 'pubDate'):
                    # do something..
                if (x.tag == 'link'):
                    # do something.. 
                if (x.tag == 'description'):
                    # do something..
except:
        print("Unable to process request feed data.")
        print(sys.exc_info()[0])
        raise
  • response_ captures the response from the GET request.
  • response.status_code checks for a 200 OK_
  • feed_data stores the response.text from the server - this will be the XML encoded message
  • root stores the root element from the XML encoded message
  • the next few lines are iterating the XML message pulling out the title, publication date, link and description from the RSS feed

Configuration file

I store a list of feeds which I want to pull from in a .json file.

I then use Python to read in the file, then make a request for a particular feed.

{
    "feeds": [
        
        {
            "name": "slashfilm",
            "link": "https://feeds2.feedburner.com/slashfilm"
        },
        {
            "name": "collider",
            "link": "http://collider.com/feed"
        },
        {
            "name": "indiewire",
            "link": "https://www.indiewire.com/feed"
        }
    ]
}

Flask

I have Flask running locally on the standard port of 5000 - only use this for testing / development purposes.

Note

Do not use this standard for production.

Flask allows an easy way to expose RESTful endpoints.

@app.route('/movie/getnews',methods=['GET'])
def getMovieNews():
    return movie_aggregator.process_feeds()       

The _@app.route matches a URL (in this case /movie/getnews) to a view function called getMovieNews().

Whenever a HTTP GET request is made to http://localhost:5000/movie/getnews - Flask will call the associated view function.

Having the methods=['GET'] ensures only GET requests are allowed.

The view function returns a JSON response.

[
    {
        "description":"Dreamworks Animation's",
        "link":"http://collider.com/friday-box-office-abominable-downton-abbey/",
        "pubdata":"16:25:50",
        "src":"collider",
        "title":"Friday Box Office",
        "today":true
    }
]

Using curl I can make a request for JSON data from the /movie/getnews endpoint.

The response will be in the JSON format shown above.

curl http://localhost:5000/movie/getnews

Now that we have RESTful endpoint exposed using Flask, it’s now time to go into the HTML page which the user will see.

Jinja2 template engine

Flask has a render_template function which renders a template from a Flask template folder.

A template folder basically holds all the template files which can be in .html format.

These template files supports the Jinja2 template engine.

So with Jinja2 you can define for loops, if statements, variable usage within html code.

An example of a for loop is shown below.

Where item.name will be rendered as text when the html page is served under Flask.

<div class="card-body">
    {%for item in myList%}
        <b>{{item.name}}</b>                                    
    {%endfor%}
</div>

Now, going back to the movie news aggregator.

The template file being used is a ‘movie_feed.html’.

Whenever a user navigates to http://localhost:5000/movie/news the Flask view function render_template_file_movieNews() will be called - which returns a template file.

@app.route('/movie/news')
def render_template_file_movieNews():
    return render_template('movie_feed.html')

HTML

Here is a look at the HTML page once rendered in the browser.

I am using the bootstrap toolkit to incorporate CSS and JavaScript elements onto the page.

On the left panel there is a table populated with news items taken from the RSS feeds.

Clicking on an item on the left updates the right panel with the title and description of the news item.

I use JQuery to make an asynchronous call to the /movie/getnews RESTful endpoint via Flask. More information on the JQuery AJAX API is available here.

function getLatestNews() {
    //console.clear();
    console.log('latest');

    $("#tableservices-movie-news").bootstrapTable('removeAll');
    $("#tableservices-movie-news").bootstrapTable('showLoading')

    i = 0;
    var today_count = 0;
    $.ajax({
        type: 'GET',
        url: '/movie/getnews',
        dataType: 'json',
        success: function (data) {
            $.each(data, function (index, item) {

                if (item.today == true) {
                    today_count++;
                }

                $("#tableservices-movie-news").bootstrapTable('insertRow', {
                    index: i,
                    row: {
                        Title: "<div id='feed_title' style='display:inline; font-size:13px' width='100%' data-title=" + item.title + ">" + item.title + "<span style='padding-left: 10px'/>" +
                            "</div>" + "<div class='text-muted' style='display:inline-block'> <span style='padding-left:2px;font-size:10px'>" +  "(" + item.src + ")" +  "<span style='padding-left: 5px'/>" + item.pubdata + "</span> </div>",

                        feed_metadata: {
                            title: item.title,
                            link: item.link,
                            description: item.description,
                            contentEncoded: item.contentEncoded,
                            publishedToday: item.today,
                            src: item.src
                        },
                    },


                });
                i++;
            });
            $("#tableservices-movie-news").bootstrapTable('hideLoading');
        }
    });
}
  • tableservices-movie-news is the table which can be seen on the left side of the html page
  • bootstrapTable('removeAll') calls the bootstrap table library and clears all existing rows from the table
  • bootstrapTable('showLoading') shows a ‘Please wait loading’ message just before the table is loaded with data asynchronously
  • bootstrapTable('insertRow', index, row) inserts a new row into the bootstrap table using the data coming from the RESTful API exposed via Flask
  • feed_metadata_ _is an additional JSON object attached to each row in the bootstrap table
  • bootstrapTable('hideLoading') hides the loading message once all the data is loaded into the bootstrap table

Whenever a row is clicked on, the description of the news item is show on the right panel.

This is achieved via JavaScript.

$('#tableservices-movie-news').on('click-row.bs.table', function (row, value, $el) {
            $($el).addClass('movie-col').siblings().removeClass('movie-col');
            $('#title').html("<h5>" + value.feed_metadata.title + "<span style='padding-left: 10px'/>" +  "<span class= 'badge badge-danger' style='display: inline-block'>" +  value.feed_metadata.src + "</span>" + "</h5>");
            $('#placeholder').html(value.feed_metadata.description +
                "<br/><br/>" + "<a target='_blank' href='" + value.feed_metadata.link + "'>" + "Read more.." + "</a>");
Last updated on 29 Sep 2019
Published on 29 Sep 2019