Student Work

Database-Integrated Analytics

Public

Downloadable Content

open in viewer

The coordination between data analytics and database systems becomes exceedingly important in order for data scientists to efficiently analyze data that is stored inside the database. Currently, there are three approaches to use data analysis tools with databases: client-server connection, in-database processing, and embedded database. This project focuses on comparing the client-server connection to the in-database processing. Two machine learning models - Support Vector Machine and Random Forest - are implemented using each of the approaches and then tested on datasets of different scales. In this project, the in-database processing approach is achieved using Apache MADlib, and the client-server connection approach is implemented using python codes. After comparing the run-time efficiency and the testing accuracy of the two approaches, conclusions are drawn regarding the performance of each approach.

  • This report represents the work of one or more WPI undergraduate students submitted to the faculty as evidence of completion of a degree requirement. WPI routinely publishes these reports on its website without editorial or peer review.
Creator
Publisher
Identifier
  • E-project-050521-185127
  • 22406
Keyword
Advisor
Year
  • 2021
Sponsor
Date created
  • 2021-05-05
Resource type
Major
Rights statement

Relations

In Collection:

Items

Items

Permanent link to this page: https://digital.wpi.edu/show/vq27zr570