This set of APIs has been created to store twitter streaming data and retrieve data based on applied filters. It is a set of 3 APIs-
Technologies used:
- Python/ Flask framework
- MongoDB (Hosted on MLab)
- Twitter Streaming API
- Installation Instructions
- API 1 - API to trigger Twitter Stream
- API 2 - API to filter/search stored tweets
- API 3 - API to export filtered data in CSV
- clone the project
git clone https://github.com/gauravkulkarni96/twitter-streaming-filter-api.git
- cd to project folder
cd twitter-streaming-filter-api
and create virtual environmentvirtualenv venv
- activate virtual environment
source venv/bin/activate
- install requirements
pip install -r requirements.txt
- run the server
python run.py
This API triggers twitter streaming and stores a curated version of the data returned by Twitter Streaming API. The streaming is done as per the given parameters.
API - http://127.0.0.1:5000/stream/<keyword>?[parameters]
(methods supported - GET, POST)
Where <keyword>
can be any keyword for which streaming needs to be performed and [parameters]
are as follows -
Parameter | Action |
---|---|
count | the streaming runs till given number of tweets are received |
time | the streaming runs for given time (seconds) |
Examples:
http://127.0.0.1:5000/stream/modi?count=5 (runs till 5 tweets are fetched)
http://127.0.0.1:5000/stream/modi?time=10 (runs for 10 seconds)
http://127.0.0.1:5000/stream/modi?count=5&time=10 (stops streaming whichever comes first i.e. 5 tweets or 10 seconds)
Parameter | Meaning |
---|---|
code | 0 (successful)/ 1(failed) |
message | error message if api hit fails |
status | success/failed |
Examples:
- Successful response
{
"code": "0",
"message": "Successful",
"status": "success"
}
- Failed Response
{
"code": "1",
"message": "No Parameters Passed",
"status": "failed"
}
This API fetches the data stored by the first api based on the filters and search keywords provided and sorts them as required.
API - http://127.0.0.1:5000/search?[filters][sort][page]
(methods supported - GET, POST)
Following are the elements of the api:
The filters follow format <filter>=<value>
where <filter>
can be one or more of filters mentioned below and <value>
should be in the specified format.
Following filters can be applied
Filter | Meaning | Value Format (refer table below) | Example |
---|---|---|---|
hashtag | filter tweets by hashtags in tweet (case insensitive) | <hashtag> |
hashtag=AbKiBaarModiSarkar |
keyword | filter tweets by keyword which was used in API 1 for streaming | <keyword> |
keyword=modi |
name | filter tweets by name/ screen_name of users (case insensitive) | <textFilterType>-<filterValue> |
name=co-gaurav |
location | location of the user posting the tweet | <location> |
location=delhi |
text | filter tweets by content (case insensitive) | <textFilterType>-<filterValue> |
text=sw-gaurav |
type | filter tweets as retweets/quote/original tweets | original/retweet/quote | type=retweet |
mention | filter tweets by user mentions(case insensitive) | <textFilterType>-<filterValue> |
mention=em-gauravkul96 |
followers | number of followers of the user | <numFilterType><filterValue> |
followers=lt100 |
rtcount (mostly 0 in streaming) | retweet count of tweet | <numFilterType><filterValue> |
rtcount=gt100 |
favcount (mostly 0 in streaming) | favourite count of tweet | <numFilterType><filterValue> |
favcount=lt100 |
lang | Language of tweet | any specific language in BCP 47 format | lang=en |
datestart | Tweets posted on or after a specific date | dd-mm-yyyy |
datestart=10-01-2018 |
dateend | Tweets posted on or before a specific date | dd-mm-yyyy |
dateend=28-02-2018 |
In the format <textFilterType><filterValue>
, <filterValue>
can be any string and <textFilterType>
can be
textFilterType | Meaning |
---|---|
sw | starts with |
ew | ends with |
co | contains |
em | exact match |
In the format <numFilterType><filterValue>
, <filterValue>
can be any number and <numFilterType>
can be
numFilterType | Meaning |
---|---|
gt | greater than |
lt | less than |
eq | equal to |
ge | greater than or equal to |
le | less than or equal to |
By default, sorting is done by date of tweet in descending order. Other sort types can be given by mentionin the sort
parameter in the API in the format <sortField>-<order>
where <order>
can be
order | Meaning |
---|---|
asc | Ascending order |
dsc | descending order |
and <sortField>
can be
sortField | Meaning | Example |
---|---|---|
name | sort by name | sort=name-asc |
sname | sort by screen name | sort=sname-dsc |
text | sort by tweet text | sort=text-asc |
fav | sort by favourites count | sort=fav-asc |
ret | sort by retweet count | sort=ret-dsc |
followers | sort by follower count of user | sort=followers-asc |
date | sort by date | sort=date-asc |
The API is paginated and returns 10 results in one call. The page number can be specified in the API call as page=[pageNo]
for example page=5
. Not speciftying the page number takes to page 1.
Examples
http://127.0.0.1:5000/search?favcount=lt1000&lang=en&datestart=10-01-2018&sort=date-asc
http://127.0.0.1:5000/search?name=co-gaurav&datestart=10-01-2018&dateend=15-01-2018&sort=text-asc&page=2
http://127.0.0.1:5000/search?rtcount=gt100
Parameter | Meaning |
---|---|
page | current page number |
next_page | next page number (1 if current page is last page) |
last_page | Boolean true/false (true if current page is last page else false) |
result | list of tweet objects that match the given filters |
result_count | total number of matching results |
Examples
{
"next_page": 1,
"last_page": true,
"result": [{"lang": "en", "_id": "5a83f5063fe5103329f1f788", "text": "RT @LalitKModi: Thank you #RichardMadley \ud83d\ude4f\ud83c\udffb most appreciative of your kind words https://t.co/erkxF1q46i", "created_at": "2018-02-14 08:36:16+00:00", "hashtags": ["RichardMadley"], "retweet_count": 0, "user_mentions": ["LalitKModi"], "is_quote_status": false, "user": {"screen_name": "LaraeGalang3", "location": null, "_id": "5a83f5053fe5103329f1f786", "id": 963690891935473666, "name": "Larae Galang"}, "id": 963693359742312448, "favorite_count": 0, "is_retweet": true}],
"page": 1,
"result_count": 1
}
{
"next_page": 3,
"last_page": false,
"result": [...],
"page": 2
"result_count": 43
}
This API returns the data in CSV. If opened in browser, it downloads a CSV file containin the data and if hit using another program, it returns the data in CSV format.
API : http://127.0.0.1:5000/getcsv?[filters][sort]
(methods supported - GET, POST)
[filters]
and [sort]
are the same parameters as defined in the Second API and there is no [page]
parameter as all the matching data is returned.
Examples
http://127.0.0.1:5000/getcsv?hashtag=richardmadley
http://127.0.0.1:5000/getcsv?favcount=lt1000&lang=en&datestart=10-01-2018&sort=date-asc
If the request to the API is sent using a browser, it downloads a CSV file containing data based on filters. If the request is sent by another program/ application like Postman etc., the API returns the data in CSV format.