Interested in this project?
Continue LearningSentiment Analysis in Node.js
In this tutorial, we'll be exploring what sentiment analysis is, why it's useful, and building a simple program in Node.js that analyzes the sentiment of Reddit comments.
What it is
Sentiment analysis is the process of extracting key phrases and words from text to understand the author's attitude and emotions. So, why is it useful? Companies can use it to make more informed marketing decisions. For example, they can analyze product reviews, feedback, and social media to track their reputation. Additionally, social networks can use sentiment analysis to weed out poor quality content.
How it works
There are two main approaches to sentiment detection: knowledge-based and statistical.
Knowledge-based approaches usually compare words in text to a defined list of negative and postive words. Finn Γ rup Nielsen from The University of Denmark published AFINN, a list of postive and negatives words, and a magnitude score of each on a scale between -5 and 5. For example, "gloom" has a score of -1, while "awful" has a score of -3. The score of all known words are added up to determine the overall sentiment of the text.
Statistical approaches make use of machine learning by analyzing known sentiments, and determining the unknown based on the knowns. For example, Amazon could create a machine learning model that analyzes the text and the 1 through 5 star rating of each product review. Then, they would be able to make an assumption about the star rating of a new review that doesn't have a star rating yet.
With any approach, a score is typically given to each body of text that is analyzed. A negative score implies the text has a mostly negative attitude, and a positive score implies the text has a mostly positive attitude.
Potential problems
There can be some challenges in analyzing text. Because of this, sentiment analysis will never be completely accurate. Here's a brief list of potential scenarios that can be tricky to analyze:
- Double negatives: "I do not dislike running"
- Inverted double negatives: "Not going to practice isn't really my thing"
- Adverb modifying adjective: "I really hate when people cut me off"
- Possible sarcasm: "I love running with a knee injury"
- Slang terms: "He ran a sick race!"
Our project
We'll be making a Node.js app that calculates the sentiment of comments from a Reddit post asking how peoples' days are going, and then displays the results in a webpage.
We're going to be creating a Node.js app, so make sure you have it installed. Then:
- Create an empty folder
- cd into that directory (with
cd ~/Desktop/folder
for example) - run
npm init
to go through the creation wizard - Install the depenencies we need from npm by running
npm install express ml-sentiment
- Download the comments.json file and put it into the folder you created
File structure
Now that our dependencies are installed, letβs create and open a server.js
file in the folder you created.
var express = require("express");
var app = express();
var ml = require("ml-sentiment")();
var redditComments = require("./comments.json");
const listener = app.listen(3000, function() {
console.log("Your app is listening on port " + listener.address().port);
});
What does this file do right now? The first block sets up Express, a web server library. The second block tells the program to import our sentiment analysis library, and the JSON data file of the Reddit comments. The last block starts our server and tells us which port it is listening on. There is nothing for the server to show though, because we haven't defined any "routes" for Express to use yet.
The Node library we're using for sentiment analysis, ml-sentiment
, has documentation that tells us how we can use it:
var ml = require("ml-sentiment");
ml.classify("Rainy day but still in a good mood");
// returns 2 ... (overall positive sentiment)
This library uses AFINN-111, which has the ratings of 2477 words and phrases. The library simply looks at the words in the parameter of the .classify
function, and compares each to AFINN-111. If a word like "not" or "don't" precedes the word, it uses the absolute value of the score. For example, "anxious" has a score of -2, while "not anxious" has a score of 2.
This is by no means a comprehensive library, but it's quick to implement, runs fast and works reliably on simple examples.
Let's create a function that loops through all of the Reddit comments, uses the ml.classify
function to get a sentiment score, and saves that value into the redditComments
array.
redditComments.forEach(function(comment) {
comment.sentiment = ml.classify(comment.body);
if (comment.sentiment >= 5) {
comment.emoji = "π";
} else if (comment.sentiment > 0) {
comment.emoji = "π";
} else if (comment.sentiment == 0) {
comment.emoji = "π";
} else {
comment.emoji = "π";
}
});
Now, our redditComments
variable is an array of objects with the link
, body
, author
, emoji
, and sentiment
keys. For example, here's how one object in the array looks:
{
"link": "https://reddit.com/r/AskReddit/comments/6szu5h/reddit_how_was_your_day/dlgtei6/",
"body": "It was so nice day. it was my memorable day. ",
"author": "Gemma_Youl",
"sentiment": 3,
"emoji": "π"
} ...
Next, we'll define two routes in Express that sends our redditComments
data in a webpage. Routes have to be defined after app
is defined, but before app.listen
is called.
app.get("/", function(req, res) {
res.sendFile(__dirname + "/index.html");
});
app.get("/data", function(req, res) {
res.json(redditComments);
});
This first route says that when the /
directory receives a GET request, Express should send the index.html
file. The second route says that when the /data
directory receives a GET request, Express should send a JSON response of the redditComments
variable.
Here's how the server.js
file looks now:
var express = require("express");
var app = express();
var ml = require("ml-sentiment")();
var redditComments = require("./comments.json");
redditComments.forEach(function(comment) {
comment.sentiment = ml.classify(comment.body);
if (comment.sentiment >= 5) {
comment.emoji = "π";
} else if (comment.sentiment > 0) {
comment.emoji = "π";
} else if (comment.sentiment == 0) {
comment.emoji = "π";
} else {
comment.emoji = "π";
}
});
app.get("/", function(req, res) {
res.sendFile(__dirname + "/index.html");
});
app.get("/data", function(req, res) {
res.json(redditComments);
});
const listener = app.listen(process.env.PORT, function() {
console.log("Your app is listening on port " + listener.address().port);
});
It doesn't work just yet! We haven't created the index.html
file yet. Make a new file called index.html
. Code this into the file:
<head>
<link
href="https://cdnjs.cloudflare.com/ajax/libs/bulma/0.7.4/css/bulma.min.css"
rel="stylesheet"
/>
<style>
#main {
margin: 2rem;
}
.big {
font-size: 1.2rem;
}
</style>
</head>
<body>
<section class="hero is-success">
<div class="hero-body">
<div class="container">
<h1 class="title">How was your day?</h1>
<h2 class="subtitle">Sentiment analysis demo</h2>
</div>
</div>
</section>
<div id="main">
<table class="table is-fullwidth">
<thead>
<tr>
<th>Feeling</th>
<th>Score</th>
<th>Author</th>
<th>Comment</th>
</tr>
</thead>
<tbody id="sentimentTable"></tbody>
</table>
</div>
<script>
var request = new XMLHttpRequest();
request.open("GET", "/data", true);
request.onload = function() {
if (request.status >= 200 && request.status < 400) {
var table = document.getElementById("sentimentTable");
var data = JSON.parse(request.responseText);
data.forEach(function(comment) {
var newRow = table.insertRow(table.rows.length);
newRow.insertCell(0).innerHTML = comment.emoji;
newRow.insertCell(1).innerHTML = comment.sentiment;
var rowLink = document.createElement("a");
rowLink.innerHTML = comment.author;
rowLink.href = comment.link;
newRow.insertCell(2).appendChild(rowLink);
newRow.insertCell(3).innerHTML = comment.body;
});
} else {
alert("Could not retrieve data");
}
};
request.onerror = function() {
alert("Could not retrieve data");
};
request.send();
</script>
</body>
How does this work? In the HTML page, a script is defined that sends a web request to /data
, and creates a new row in a table for each sentiment we analyzed.
Everything is good to go! To run your program, go back to the terminal and run node server.js
. Make sure you are still in your project's directory. Now, go to your browser and open localhost:3000
. You should see our new webpage with the sentiment of each Reddit comment!
Notice how some comments have negations, like "not bad", and the sentiment has a postive value. This is because the sentiment library we used has basic support for negation.
What's next
Try running your own text through the sentiment analyzer. For example, download your Twitter archive and analyze the sentiment of your tweets. Let us know your projects in the comments below!
Comments (0)