PROJECT BRIEF: Social Media & News Comment Monitoring System (Data Collection + NLP + Admin Panel + Power BI Integration)
1. Project Overview
We are looking for an experienced full-stack developer or development team to build a complete monitoring system for collecting, processing, analysing and exporting social media and news comment data. The system must automatically fetch content, perform text cleaning and classification, run sentiment analysis, and make the processed data available for dashboards in Power BI.
The project consists of four core components:
Data acquisition
Data processing & NLP
Admin panel
Power BI integration
Hosting can be on AWS / Azure / VPS (developer may propose optimal architecture).
2. DATA ACQUISITION (APIs + Legal Scraping)
The developer must be able to implement:
2.1 Public Data Sources
X/Twitter API (official API only)
YouTube API (video search, metrics, comments)
Reddit API
News portals with public comment sections (scraping only where legally permitted)
Facebook/Meta data only if possible via legal Graph API endpoints (no scraping; must comply with Meta Platform Policy: https://developers.facebook.com/policy/
)
2.2 Data Collection Requirements
Configurable fetch frequency (e.g., every hour / once per day)
Logging of errors and API limits
Storage in a structured SQL database (PostgreSQL preferred, but open to alternatives)
Expected data volume (initial estimate): 5,000–50,000 rows per day
Scalable architecture to support higher volumes later
2.3 Data Structure
Each collected item should include at minimum:
Date/time
Platform
Source URL / ID
Full text
Author (if available & legally permissible)
Number of likes / replies / views (if applicable)
3. DATA PROCESSING & NLP PIPELINE
The developer must create an automated processing pipeline with:
3.1 Text Pre-Processing
Removal of HTML, special characters, emojis, spam patterns
Language detection: LV, RU, EN
Tokenisation and normalisation
3.2 Automated Keyword & Topic Classification
Keyword matching for predefined keyword lists
Automatic extraction of additional keywords
Optional: Topic modelling (LDA or similar), if feasible
3.3 Sentiment Analysis
Sentiment classification (positive / negative / neutral)
Developer may use an existing open-source model or fine-tune a lightweight model
Accuracy must be optimised for Latvian/English/Russian if possible
3.4 Output Table
For each processed text:
Date
Platform
Source
Text
Sentiment value
Topic / category
Identified keywords
Metrics (likes, shares, views, etc.)
4. ADMIN PANEL (Web Application)
A secure admin panel where non-technical users can manage monitoring settings.
4.1 Core Features
Add / edit / delete keywords and keyword groups
Create thematic monitoring groups (e.g., “Elections”, “Education”, “Human rights”)
Enable / disable data sources
Filter by:
Date range
Platform
Language
Sentiment
Keywords
View summary statistics (optional basic charts)
4.2 Users & Authentication
Initially 1–3 admin users
Standard login system (JWT or session-based)
Role-based access (optional)
5. POWER BI INTEGRATION
Developer must provide one of the following:
Option A — Direct API Feed
Provide an API endpoint that Power BI can refresh automatically.
Option B — Export to Files
Periodic export of data into:
CSV
JSON
Parquet (optional)
Stored on AWS S3 / Azure Blob / server folder for Power BI ingestion.
Power BI Requirements
The exported/served table must include:
date
platform
source
text
sentiment
topic
keywords
metrics columns
unique ID
timestamp of last update
Automatic scheduled refresh is required.
6. TECHNICAL REQUIREMENTS
6.1 Preferred Tech Stack (flexible)
Backend: Python (FastAPI / Flask) or Node.js
Frontend: React / Vue / standard admin template
Database: PostgreSQL or MySQL
NLP: Python (spaCy, NLTK, transformers)
Hosting: AWS / Azure / DigitalOcean / other (developer may propose)
6.2 Documentation
The developer must deliver:
API documentation
Database schema
Instructions on deployment
Admin panel user guide
6.3 Code Ownership
All code and database schema must be fully delivered with full usage rights.
7. WHAT TO INCLUDE IN YOUR PROPOSAL
Developers should provide:
Description of relevant experience in:
API integrations
NLP / sentiment analysis
Web scraping
Admin panel development
Power BI integrations
Proposed tech stack
Estimated timeframe
Estimated cost (fixed price or hourly)
Clarifications/questions if any requirements are unclear
8. Additional Notes
Project must comply with all platform policies (Meta, YouTube, X, Reddit, etc.).
No illegal scraping of Facebook is allowed.
Scalability and reliability are important.
GDPR
Please also integrate Google Trends (via Pytrends or another available method) to monitor search interest for selected keywords in Latvia. The admin panel should allow adding keywords, and the system should store the search popularity time series (daily or weekly) in the database for Power BI.
Show More