Student Work

Evaluating Energy Efficiency of State-of-the-art Hardware With MLIR Compiling Framework

Public Deposited

Downloadable Content

open in viewer

As the grand scheme for distributed deep learning approaching exascale computing, GPU vendors introduced proprietary micro-architecture for their GPUs to meet such demand. In order to efficiently employ the full capability of such hardware novelties, GPU vendors also ship their products with powerful, hand-tuned API frameworks and libraries. The challenge of such approach is that they require programmers with specialized experience to utilize them to the fullest – and since such libraries were written in lower level abstraction to that of modern programming language for machine learning applications, they restrained productivity with unavoidable boilerplating routines and codes. We mitigated this problem by automating the lowering process and codegen for GPUs – from a high-level framework, down to low-level hardware-specific instruction using the MLIR compiler framework. In this report, we demonstrated the process with automatic generated kernel codes for 2D vector multiplication. Our experiment shows that with large matrix multiplication, our pipeline achieve near peak throughput that is comparable to vendor fine-tuned codes

  • This report represents the work of one or more WPI undergraduate students submitted to the faculty as evidence of completion of a degree requirement. WPI routinely publishes these reports on its website without editorial or peer review.
Creator
Publisher
Identifier
  • E-project-122122-161015
  • 84416
Advisor
Year
  • 2022
Date created
  • 2022-12-21
Resource type
Major
Source
  • E-project-122122-161015
Rights statement
Last modified
  • 2023-01-12

Relations

In Collection:

Items

Items

Permanent link to this page: https://digital.wpi.edu/show/ft848v087