Plagiarism checker website

user profile photouser profile photouser profile photo1539 developers have joined this project.

What you will practice

You'll learn how to build a web application with Flask and Bootstrap. You'll do text processing and web scraping to find similar content and automatically detect potential plagiarism.

Introduction

You'll be building an automated solution that handles plagiarism detection.

This might be used for publishing companies to replace a manual process in which they search for phrases from submitted manuscripts on Google to find pre-existing work.

Requirements

Your task is to build a web application where a user can upload a file (e.g. an MS Word document or Google Doc file) and get matches for similar content around the Internet.

In the back-end, your program should read the content from the uploaded file, extract some random phrases of around 5-10 words each, and run a Google search on them.

The program should then load the pages for each of the top five Google search results for each phrase and compare the content in that page with the content of the submitted file.

The program should then return a percentage of how similar the content is and also list the similar phrases and original URLs.

More specifically, the program should:

  • Consist of a web page where the user can upload a document.
  • Extract random phrases from the document.
  • Run a Google search against the random phrases.
  • Scrape the content from the top five Google search results for each phrase.
    • Clean the scraped content: remove headers etc, and just keep the main text.
  • Compare the content of the submitted file with each of the scraped results. You can use any text similarity metric, or make this customisable.
  • Return a percentage of how similar the uploaded content is to the scraped content.
  • Return the similar parts of the content with a link to the original scraped URL.

For an extra challenge: You can add a PDF generation pipeline that allows the user to download the results in a PDF formatted report.

Suggested Implementation

This project can be implemented with the Python programming language, the Flask web framework, ScrapingBee or a similar service to scrape Google search results, and BeautifulSoup to clean the HTML. You can use Bootstrap for the front-end.

Hit a programming wall?
Get help from our mentors

  • Post request free
  • First 15 mins free

Suggested languages and frameworks

PythonBootstrap

Difficulty

hard

Contributed by

Founder @ Ritza with 8+ years software engineering, leadership, and technical writing experience

Interested in this project?

Shorten your learning curve with on-demand programming help

The awesome set of verified mentors will provide guidance and mentoring help when you are stuck.

Suresh Atta

  • Post request free
  • First 15 mins free
Shorten your learning curve with on-demand programming help

Browse more projects

More coming soon...