Automated News Aggregator

Abstract

Automated News Aggregator is a Python project that uses AI to collect and summarize news articles. The application features web scraping, summarization, and a CLI interface, demonstrating best practices in information retrieval and NLP.

Prerequisites

Python 3.8 or above
A code editor or IDE
Basic understanding of web scraping and NLP
Required libraries: requestsrequests, beautifulsoup4beautifulsoup4, nltknltk, sumysumy

Before you Start

Install Python and the required libraries:

Install dependencies

pip install requests beautifulsoup4 nltk sumy

Install dependencies

pip install requests beautifulsoup4 nltk sumy

Getting Started

Create a Project

Create a folder named automated-news-aggregatorautomated-news-aggregator.
Open the folder in your code editor or IDE.
Create a file named automated_news_aggregator.pyautomated_news_aggregator.py.
Copy the code below into your file.

Write the Code

⚙️ Automated News Aggregator

Automated News Aggregator

"""
Automated News Aggregator
 
Features:
- Topic classification
- Sentiment analysis
- Web interface (Flask)
- Modular design
- Error handling
"""
import requests
from flask import Flask, jsonify
import threading
import sys
import random
 
class NewsFetcher:
    def __init__(self):
        self.news = []
    def fetch(self):
        # Dummy: fetch random news
        for _ in range(10):
            self.news.append({
                'title': f'News {_}',
                'content': f'Content for news {_}',
                'topic': random.choice(['tech', 'sports', 'politics']),
                'sentiment': random.choice(['positive', 'neutral', 'negative'])
            })
 
class NewsAPI:
    def __init__(self, fetcher):
        self.app = Flask(__name__)
        self.fetcher = fetcher
        self.setup_routes()
    def setup_routes(self):
        @self.app.route('/news', methods=['GET'])
        def get_news():
            return jsonify(self.fetcher.news)
    def run(self):
        self.app.run(debug=True)
 
class CLI:
    @staticmethod
    def run():
        fetcher = NewsFetcher()
        fetcher.fetch()
        api = NewsAPI(fetcher)
        print("Starting News Aggregator API on http://127.0.0.1:5000 ...")
        api.run()
 
if __name__ == "__main__":
    try:
        CLI.run()
    except Exception as e:
        print(f"Error: {e}")
        sys.exit(1)

Automated News Aggregator

"""
Automated News Aggregator
 
Features:
- Topic classification
- Sentiment analysis
- Web interface (Flask)
- Modular design
- Error handling
"""
import requests
from flask import Flask, jsonify
import threading
import sys
import random
 
class NewsFetcher:
    def __init__(self):
        self.news = []
    def fetch(self):
        # Dummy: fetch random news
        for _ in range(10):
            self.news.append({
                'title': f'News {_}',
                'content': f'Content for news {_}',
                'topic': random.choice(['tech', 'sports', 'politics']),
                'sentiment': random.choice(['positive', 'neutral', 'negative'])
            })
 
class NewsAPI:
    def __init__(self, fetcher):
        self.app = Flask(__name__)
        self.fetcher = fetcher
        self.setup_routes()
    def setup_routes(self):
        @self.app.route('/news', methods=['GET'])
        def get_news():
            return jsonify(self.fetcher.news)
    def run(self):
        self.app.run(debug=True)
 
class CLI:
    @staticmethod
    def run():
        fetcher = NewsFetcher()
        fetcher.fetch()
        api = NewsAPI(fetcher)
        print("Starting News Aggregator API on http://127.0.0.1:5000 ...")
        api.run()
 
if __name__ == "__main__":
    try:
        CLI.run()
    except Exception as e:
        print(f"Error: {e}")
        sys.exit(1)

Example Usage

Run news aggregator

python automated_news_aggregator.py

Run news aggregator

python automated_news_aggregator.py

Explanation

Key Features

Web Scraping: Collects news articles from the web.
Summarization: Uses NLP to summarize articles.
Error Handling: Validates inputs and manages exceptions.
CLI Interface: Interactive command-line usage.

Code Breakdown

Import Libraries and Setup Scraper

automated_news_aggregator.py

import requests
from bs4 import BeautifulSoup
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lsa import LsaSummarizer
import nltk

automated_news_aggregator.py

import requests
from bs4 import BeautifulSoup
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lsa import LsaSummarizer
import nltk

Web Scraping and Summarization Functions

automated_news_aggregator.py

def scrape_news(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    paragraphs = soup.find_all('p')
    text = ' '.join([p.get_text() for p in paragraphs])
    return text
 
def summarize_text(text, sentences_count=5):
    parser = PlaintextParser.from_string(text, Tokenizer('english'))
    summarizer = LsaSummarizer()
    summary = summarizer(parser.document, sentences_count)
    return ' '.join(str(sentence) for sentence in summary)

automated_news_aggregator.py

def scrape_news(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    paragraphs = soup.find_all('p')
    text = ' '.join([p.get_text() for p in paragraphs])
    return text
 
def summarize_text(text, sentences_count=5):
    parser = PlaintextParser.from_string(text, Tokenizer('english'))
    summarizer = LsaSummarizer()
    summary = summarizer(parser.document, sentences_count)
    return ' '.join(str(sentence) for sentence in summary)

CLI Interface and Error Handling

automated_news_aggregator.py

def main():
    print("Automated News Aggregator")
    while True:
        cmd = input('> ')
        if cmd == 'aggregate':
            url = input("News URL: ")
            try:
                text = scrape_news(url)
                summary = summarize_text(text)
                print(f"Summary: {summary}")
            except Exception as e:
                print(f"Error: {e}")
        elif cmd == 'exit':
            break
        else:
            print("Unknown command. Type 'aggregate' or 'exit'.")
 
if __name__ == "__main__":
    main()

automated_news_aggregator.py

def main():
    print("Automated News Aggregator")
    while True:
        cmd = input('> ')
        if cmd == 'aggregate':
            url = input("News URL: ")
            try:
                text = scrape_news(url)
                summary = summarize_text(text)
                print(f"Summary: {summary}")
            except Exception as e:
                print(f"Error: {e}")
        elif cmd == 'exit':
            break
        else:
            print("Unknown command. Type 'aggregate' or 'exit'.")
 
if __name__ == "__main__":
    main()

Features

Automated News Aggregation: Web scraping and summarization
Modular Design: Separate functions for scraping and summarizing
Error Handling: Manages invalid inputs and exceptions
Production-Ready: Scalable and maintainable code

Next Steps

Enhance the project by:

Supporting batch aggregation
Creating a GUI with Tkinter or a web app with Flask
Adding support for more summarization algorithms
Unit testing for reliability

Educational Value

This project teaches:

Information Retrieval: Web scraping and summarization
Software Design: Modular, maintainable code
Error Handling: Writing robust Python code

Real-World Applications

News Aggregators
Content Management
Educational Tools

Conclusion

Automated News Aggregator demonstrates how to build a scalable and accurate news aggregation tool using Python. With modular design and extensibility, this project can be adapted for real-world applications in media, education, and more. For more advanced projects, visit Python Central Hub.

Automated News Aggregator

Abstract

Prerequisites

Before you Start

Getting Started

Create a Project

Write the Code

Example Usage

Explanation

Key Features

Code Breakdown

Features

Next Steps

Educational Value

Real-World Applications

Conclusion

Was this page helpful?