November 30th -
December 2nd  2022.

FAI Summit 2022

Join us at one of the fastest-growing Artificial Intelligence Summits where stakeholders, industry experts, and AI enthusiasts network and explore the future of AI..

Register View The Report

Why Attend FAI Summit

Learn from top AI and ML industry leaders
Unmissable Discussions in panel sessions
Explore AI Use Cases and Trends
Hands-on Networking
Discover the latest AR and VR
Emerging tech Startups
Advance your career
Unlock the power of Sustainable AI

Watch FAI Summit Trailer

Watch FAI Summit Montage

Meet Some of our Speakers

Deepak Rana
Founder, CEO
ThinkScan Technologies
Janise McNair
Associate Professor
University of Florida
Abiodun Musa Aibinu
Vice Chancellor
Summit University, Offa, Nigeria
Mechatronics, Robotics and Automation Engineering Phd
Tiffany Rios Live
Digital Solution Area Specialist
Microsoft
Dave Ojika
Founder, CEO
Flapmax
Seyong Lee
Senior Computer Scientist,
Oak Ridge National Laboratory
Amos Omokpo 
Software Support Automation Manager Symbotic
Founder & CTO, twoMatches 
Prashanth Thinakaran 
Member of Technical Staff,
Cerebras 
Faika Bashoglu 
Assistant Professor,European University of Lefkeke
Benjamin Udokwu
ESG Advisor & VC Scout, Flapmax
Venture Partner,
Republic 
  Clara M. Mosquera Lopez
AI/Machine Learning Research Scientist,
Assistant Professor,
Oregon Health and Science University 
Joo-Young Kim
Assistant Professor,
KAIST
Director of AISS 
Shinjae Yoo 
Computational Scientist & ML Group Lead, BNL
Director of AI & ML at
Scale
Aman Arora
PhD Graduate Fellow,
University of Texas at Austin
Luigi Meschini
Business Development
Rainmaking
Head of Accelerator Funding Network -
The Accelerator Network LtD
View More Speakers

What is FAI Summit?

A space for entrepreneurs, acclaimed c-suite executives, industry leaders, and tech lovers alike to dive into the realm of Artificial Intelligence and Technology under the African ecosystem with a compilation of physical and virtual events. Presentations, cutting-edge discussions, and networking, sustainability solutions, and much more... Register to participate, learn, or play today!

Why Attend FAI Summit?

Three days filled with unmissable workshops, speakers, industry leading experts, leading tech brands, startups, networking, opportunities and more. FAI summit brings together like-minded, innovative, and go-getters both physically and virtually to spark meaningful discussions and healthy relations. An opportunity for aspiring individuals and veterans of the tech industry to join forces in discussing the impact of Artificial Intelligence in modern society as it has become the front burner in the adoption of technology for digital transformation.

Segments

AI Builders Garage

Emerging Tech

Women In Tech

Startup AI

VC Roundtable

Sustainability

Panel Session

Research

FAI Certification

Community

Youth Entrepreneurship

Unlocking data is key to improving business efficiency. Artificial Intelligence is one of the technologies that is critical in the utilization of big data.

Developers Hackathon

Making seamless mobility a reality in Africa

FAI Summit Agenda

13:00 - 13:10 WAT
07:00 - 07:10 EST
10 Minutes
Opening Remarks
Opening Remarks
Dave Ojika PhD
Founder, CEO
Flapmax
13:10 - 13:30 WAT
07:10 - 07:30 EST
20 Minutes
Keynote Address I
Scaling Digital Transformation in Emerging Markets: Role of Artificial Intelligence (Theme: Digital Transformation)
Professor Musa Aibinu (PhD in Mechatronics and AI)
Vice-ChancellorSummit University, Offa, Kwara State, Nigeria.
13:30 - 13:45 WAT
07:30 - 07:45 EST
15 Minutes
Presentation
Presentation: Automation
Amos Omokpo
Manager, Software Support AutomationSymbotic
13:45 - 13:50 WAT
07:45 - 07:50 EST
5 Minutes
Break
Break and Music Interlude
13:50 - 14:05 WAT
07:50 - 08:05 EST
15 Minutes
Presentation
Presentation: Sustainability
Benjamin Udokwu
Managing PartnerClimatr
14:05 - 14:45 WAT
08:05 - 08:45 EST
40 Minutes
Panel Session
Panel Session: Sustainability  
MODERATOR
Divya Vellanki
Product MnagementFAI Institute
PANELISTS
Innocent Orikiiriza
CEOKaCyber
Hank Selke
MD, CEOSnarkHealth
Benjamin Udokwu
Managing PartnerClimatr
14:45 - 15:05 WAT
08:45 - 09:05 EST
20 Minutes
Emerging Tech
Blockchain
Nancy Min
FounderEcolong
15:05 - 15:10 WAT
09:05 - 09:10 EST
5 Minutes
Break
Break and Music Interlude  
15:10 - 15:50 WAT
09:10 - 09:50 EST
40 Minutes
VC Roundtable
VC Roundtable
MODERATOR
Eric Edokpa
Business Development ManagerFlapmax
PANELISTS
Matthew Mardsen
FounderDealbase Africa
Luigi Meschini
Head of Accelerator Funding NetworkThe Accelerator Network
Maksymilian Kusmierek
Managing DirectorGalactiv EooD
Avik Ashar 
PartnerSaison Capital
15:50 - 16:20 WAT
09:50 - 10:20 EST
30 Minutes
Panel Session
Panel Session: “Digital Transformation & Women in AI”  
MODERATOR
Betty Wairegi
PANELISTS
Dapiriye Briggs
Onikepo Amodu
Itunu Gbadamosi
Sophia Ekeh
Kamali Mathiazhagan
Modupeola Savage
16;20 - 16:25 WAT
10:20 - 10:25 EST
5 Minutes
Break
Break and Music Interlude
16:25 - 16:45 WAT
10:25 - 10:45 EST
20 Minutes
Keynote Address II
Scaling Digital Transformation in Emerging Markets: Role of Artificial Intelligence (Theme: Cybersecurity + Edge AI)
Deepak Rana
Founder, CEOThinkScan
16:45 - 17:05 WAT
10:45 - 11:05 EST
20 Minutes
Emerging Tech
Virtual Reality
Benjamin Lok, PhD
Computer Science Professor and EntrepreneurUniversity of Florida
17:05 - 17:30 WAT
11:05 - 11:45 EST
25 Minutes
Panel Session
Panel Session: AI & Financial Services “Fintech Services in Africa: Use of AI to enable safe transactions”    
MODERATOR
Eric Edokpa
Business Development ManagerFlapmax
PANELISTS
Isa AliyuShata
VP/CEOKongapay
Amitesh S.
CTOCapsa Technology
Mustapha Suberu
CEOCapsa Technology
17;20 - 17:35 WAT
11:30 - 11:35 EST
5 Minutes
Break
Break and Music Interlude  
17:35 - 17:55 WAT
11:35 - 11:55 EST
20 Minutes
Emerging Tech
AR/VR
Nilanjan Goswami, PhD
Graphics ArchitectMeta
17:55 - 18:10 WAT
11:55 - 12:10 EST
15 Minutes
Presentation
Presentation: Healthcare
Faika Bashoglu, PhD
Pharmacist and Assistant ProfessorEuropean University of Lefke
18:10 - 18:40 WAT
12:10 - 12:40 EST
30 Minutes
Panel Session
Panel Session: Healthcare
MODERATOR
G Anthony Reina, PhD, MD
Senior Data ScientistResilience
PANELISTS
Faika Bashioglu, PhD
Pharmacist and Assistant ProfessorEuropean University of Lefke
Hank Selke
MD, Co-Founder and CEOSnarkHealth
Dr. Trish Scanlan
CEOWe are TLM
18:40 - 18:55 WAT
12:40 - 12:55 EST
15 Minutes
Presentation
Presentation: “AI in Healthcare”
Rashwan Dany
SnarkHealth Project's InternshipUniversity of Florida
18:55 - 19:00 WAT
12:55 - 13:00 EST
5 Minutes
Closing Remarks
Acknowledgement/ Closing Remarks
Abiola Jimoh
Senior Programs ManagerFlapmax
13:00 - 13:20 WAT
07:00 - 07:20 EST
20 Minutes
Keynote Address III
Scaling Digital Transformation in Emerging Markets: Role of Artificial Intelligence (Theme: Women in AI)
Tiffany Rose Live
Microsoft, Wentors' AmbassadorWentors
13:05 - 13:50 WAT
07:20 - 07:50 EST
30 Minutes
Panel Session
Panel Session: “Digital Transformation & Women in AI”
MODERATOR
Adepeju Shittu
PANELISTS
Divya Vellanki
Anuoluwapo Tayo-Alabi
Maureen Mbugua
Margaret Medina
Ufua Ameh
Wentors
13:50 - 13:55 WAT
07:5 0- 07:55 EST
5 Minutes
Break
Break and Music Interlude  
13:55 - 14:15 WAT
07:55 - 08:15 EST
20 Minutes
Emerging Tech
Emerging Tech : Robotics
Professor Abiodun Musa Aibinu PhD (Mechatronics and AI)
Vice-ChancellorSummit University, Offa, Kwara State, Nigeria
14:15 - 14:55 WAT
08:15 - 08:55 EST
40 Minutes
Startup AI
Startup AI: Lightening Talks
MODERATOR
Dave Ojika
CEOFlapmax
SPEAKERS
Edwin Lubanga
CTOSnarkHealth
Amitesh S.
CTOCapsa Technology
Mustapha Suberu
Co-Founder and CEOCapsa Technology
Paulus Indongo
CTOK-12Plus
14:55 - 15:00 WAT
08:55 - 09:00 EST
5 Minutes
Break
Break and Music Interlude  
15:00 - 15:20 WAT
08:15 - 08:55 EST
20 Minutes
Startup AI
Startup AI: Lightening Talks (Continuation)
MODERATOR
Dave Ojika
CEOFlapmax
SPEAKERS
Samuel Ogbujimma
CTOLegitCar
Amsatou Mbengue, PhD
CTOKaCyber
15:20 - 16:40 WAT
09:20 - 09:40 EST
20 Minutes
Keynote Address IV
Scaling Digital Transformation in Emerging Markets: Role of Artificial Intelligence (Theme: AI/IoT in Agriculture)
Janise McNair
Associate ProfessorUniversity of Florida
15:40 - 15:55 WAT
09:40 - 09:55 EST
15 Minutes
Presentation
Presentation “AI & Agriculture”
Sadiq Falalu, MFR
CEOFalgates
15:55 - 16:25 WAT
09:55 - 10:25 EST
30 Minutes
Panel Session
Panel Session: Cyberinfrastructure, Apps, and Security
MODERATOR
Janise McNair
Associate ProfessorUniversity of Florida
PANELISTS
Jumoke Oloyede
Senior Specialist Threat Intelligence and HuntingMTN Group
Cat S.
Principal Adversary Emulation EngineerMITRE
Sadiq Falalu
CEOFalgates
Rich Wurden
CEOAigen
16:25 - 16:30 WAT
10:25 - 10:30 EST
5 Minutes
Break
Break and Music Interlude  
16:30 - 14:50 WAT
10:30 - 10:50 EST
20 Minutes
Emerging Tech
Emerging Tech: AI
Haitang Wang, PhD
Data ScientistAmazon
16:50 - 17:05 WAT
10:50 - 11:05 EST
15 Minutes
Presentation
Presentation: AI & Edutech
Emeka Nzeih
Head Professional Services and CertificationDigital Bridge Institute
17:05 - 17:35 WAT
11:05 - 11:35 EST
30 Minutes
Panel Session
Panel Session: AI & Education
MODERATOR
Eric Edokpa
Business Development ManagerFlapmax
PANELISTS
Ozaal Zesha
CEOClassNotes
Emeka Nzeih
Head Professional Services and CertificationDigital Bridge Institute
Paulus Indongo
Co-Founder and CTOK-12Plus
Lisa Woodson
Project Coordinator, Continuing and Professional DevelopmentSan Jacinto
17:35 - 17:40 WAT
11:35 - 11:40 EST
5 Minutes
Break
Break and Music Interlude
17:40 - 18:00 WAT
11:40 - 12:00 EST
20 Minutes
AI Builders Garage
AI Builders Garage: Winners Announcements
Benard Irungu
Senior Programs ManagerFlapmax
18:00 - 18:05 WAT
12:00 - 12:05 EST
5 Minutes
Closing Remarks
Acknowledgement and Closing Remarks
Sheila Perocho
HR and Admin ManagerFlapmax
10:30 - 10:35
5 Minutes
Welcome Address
Welcome Address
Dave Ojika
Flapmax

Keynote: SHREC is an NSF Center for Space, High-Performance, and Resilient Computing. In this talk, we will give an overview of the SHREC Center, its composition and its mission; and present an overview of four active projects at the University of Florida site of SHREC, under the umbrella of Heterogeneous Computing for Data Science:   Compute Cache Hierarchy: focusing on Compute-near-Memory technologies such as FPGAs and Compute-in-Memory technologies such as PIM (Process-in-Memory) and IPU (Intelligent Processing Unit) devices. Heterogeneous Pont Cloud Net (HgPCN): a heterogeneous architecture for embedded 3-D point-cloud inference which aims to satisfy the stringent real-time requirements of applications on the computing edge. Productive Computational Science (PCS) Platform: which provides a programming abstraction that is accelerator-system agnostic, focusing on scalability and productivity to meet the demands of rapidly changing AI workloads and heterogeneous architectures. End-to-end ML Pipeline: leveraging Intel’s AI Analytics Toolkit to develop an end-to-end ML pipeline using oneAPI.

Herman Lam
Associate Professor, University of Florida

Bio: Herman Lam is an Associate Professor of Electrical and Computer Engineering at the University of Florida. Currently, his main research interest is in heterogeneous computing (HGC) and reconfigurable computing (RC), focusing on methods and tools for the acceleration and deployment of scientifically impactful applications on scalable RC and HGC systems. He was a Co-PI of the 2012 Alexander Schwarzkopf Prize for Technology Innovation from the National Science Foundation for “Novo-G: An innovative and synergistic research project and the world’s most powerful reconfigurable supercomputer”. Dr. Lam has authored or co-authored over 175 refereed conference and journal articles and one textbook. He served as the Associate Director of CHREC, the NSF Center for High-Performance Reconfigurable Computing. Currently, Dr. Lam is the University of Florida Site Director of the NSF Center for Space, High-Performance, and Resilient Computing. Academically, Dr. Lam was the Director of the Computer Engineering undergraduate program in the College of Engineering at the University of Florida from 2012-2021.  

Contact information: http://www.hlam.ece.ufl.edu/ hlam@ufl.edu

10:35 - 11:30
55 Minutes
Keynote Session
Heterogeneous Computing for AI @ SHREC
Herman Lam
University of Florida and NSF SHREC

Abstract: By providing highly efficient one-sided communication with globally shared memory space, Partitioned Global Address Space (PGAS) has become one of the most promising parallel computing models in high-performance computing (HPC). Meanwhile, FPGA is getting attention as an alternative compute platform for HPC systems with the benefit of custom computing and design flexibility. However, the exploration of PGAS has not been conducted on FPGAs, unlike the traditional message passing interface. This paper proposes FSHMEM, a software/hardware framework that enables the PGAS programming model on FPGAs. We implement the core functions of GASNet specification on FPGA for native PGAS integration in hardware, while its programming interface is designed to be highly compatible with legacy software. Our experiments show that FSHMEM achieves the peak bandwidth of 3813 MB/s, which is more than 95% of the theoretical maximum, outperforming the prior works by 9.5×. It records 0.35us and 0.59us latency for remote write and read operations, respectively. Finally, we conduct a case study on the two Intel D5005 FPGA nodes integrating Intel's deep learning accelerator. The two-node system programmed by FSHMEM achieves 1.94× and 1.98× speedup for matrix multiplication and convolution operation, respectively, showing its scalability potential for HPC infrastructure.

Yashael Faith Arthanto
Hardware Engineer, Rebellions AI

Bio: Yashael Faith Arthanto is from Indonesia. He received a BS degree from Bandung Institute of Technology Indonesia in 2019. He recently earned a MS degree in EEE at KAIST, South Korea in 2022. Now, he works for an AI chip startup called rebellions in South Korea.  

Contact Information: yashael.faith@alumni.kaist.ac.kr

11:30 - 12:00
30 Minutes
Session 1
FSHMEM: Supporting Partitioned Global Address Space on FPGAs for Large-Scale Hardware Acceleration Infrastructure
Yashel Faith Arthanto
Rebellions AI/KAIST

Abstract: Graph neural networks (GNNs) are becoming increasingly important in many applications such as social science, natural science, and autonomous driving. Driven by real-time inference requirement, GNN acceleration has became a key research topic. Given the largely diverse GNN model types, such as graph convolution network, graph attention network, graph isomorphic network, with arbitrary aggravation methods and edge attributes, designing a generic GNN accelerator is challenging. In this talk, we discuss our proposed generic and efficient GNN accelerator, called FlowGNN, which can easily accommodate a wide range of GNN types. Without losing generality, FlowGNN can outperform CPU and GPU by up to 400 times. In addition, we discuss an open-source automation flow, GNNBuilder, which allows users to design their own GNNs in PyTorch and then automatically generates the accelerator code targeting FPGA.

Callie Hao
Assistant Professor, Georgia Institute of Technology

Bio: Dr. Cong (Callie) Hao is an assistant professor in ECE at Georgia Tech. She was a postdoctoral fellow at Georgia Tech from 2020-2021 and at UIUC from 2018-2020. She received the Ph.D. degree in Electrical Engineering from Waseda University in 2017, and the M.S. and B.S. degrees in Computer Science and Engineering from Shanghai Jiao Tong University. Her primary research interests lie in the joint area of efficient hardware design and machine learning algorithms, including software/hardware co-design for reconfigurable and high-efficiency computing and agile electronic design automation tools.  

Contact information: https://sharclab.ece.gatech.edu/ callie.hao@ece.gatech.edu

12:00 - 12:30
30 Minutes
Session 1
Generic and automated graph neural network acceleration
Callie Hao
Georgia Tech
12:30-13:30
60 Minutes
Lunch
Lunch Break

Abstract: Hardware accelerator can help data scientists and ML engineers run much faster their applications but deploying these hardware accelerators was quite challenging until now. In this talk we will show how ML developers can utilize the power of the hardware accelerators like FPGA with zero code changes. FPGAs are adaptable hardware platforms that can offer great performance, low-latency and reduced OpEx for applications like machine learning. We will show how users can enjoy the performance of hardware accelerators and at the same time enjoy the easy of deployment like an other computing platform.

Chris Kachris
CEO, InAccel

Bio: Christoforos Kachris is the co-founder and CEO of InAccel that helps companies speedup their AI/ML applications using hardware accelerators (FPGAs) in the cloud or on-prem. Christoforos holds a Ph.D. on Computer Engineering from Delft University of Technology and he has more than 20 years of experience on hardware acceleration. He is the editor of the “Hardware Accelerators in Data Centers” and co-author of more than 80 scientific peer-reviewed publications on FPGA-based hardware acceleration (with more than 2400 citations). He was the supervisor of 3 winners on the international Open Hardware contest for his contribution on ML acceleration in 2018 and 2020.  

Contact Information: https://inaccel.com/ chris@inaccel.com

13:30 - 14:00
30 Minutes
Session 2
How to speedup your ML applications, instantly
Chris Kachris
InAccel

Abstract: Many large-scale physics experiments, such as ATLAS at the Large Hadron Collider, Deep Underground Neutrino Experiment and sPHENIX at the Realistic Heavy Ion Collider, rely on accurate simulations to inform data analysis and derive scientific results. However, inevitable discrepancy between simulation and experiments requires corrections using heuristics in a conventional analysis workflow. It also prevents data-driven models, learned on simulation data, from inferring experiment data directly. Our goal is to develop machine learning methods that can bridge the gap between simulations and experiments. Our initial effort demonstrated the feasibility of such approach using a Vision Transformer augmented U-Net under the CycleGAN framework. In this talk, I will present our model (UVCGAN) and its applications on two tiers of data from Liquid Argon Time Projection Chamber simulations. UVCGAN is also competitive against other advanced image translation models on open benchmark data sets.

Yihui (Ray) Ren
Associate Research Scientist, BNL

Bio: Yihui, a.k.a. "Ray", works in the general area of Artificial Intelligence (AI), its applications in science and its interaction with novel hardware. Ray's current research topics include unpaired image translation to bridge the gap between simulation and experiments, neural network optimization and deployment for real-time systems, novel hardware exploration and benchmarking, privacy-preserving AI, and bringing advanced AI methods to scientific domains.

Contact Information: https://www.bnl.gov/staff/yren yren@bnl.gov

14:00 - 14:30
30 Minutes
Session 2
Bridging Gaps between Simulation and Experiment
Yihui Ren
BNL

Abstract: Brief Information regarding the topic presented by the speaker.




.

Abelardo Jara Berrocal
AMD

Bio: Brief biography of the speaker.


.

Contact Information:

14:30 - 15:00
30 Minutes
Session 2
Topic Title
Abelardo Jara-Berrocal
AMD
15:00 - 15:10
10 Minutes
Break
Coffee Break

Abstract: Large Language Models are shifting “what’s possible” in AI, but distributed training across thousands of traditional accelerators is massively complex and always suffers diminishing returns as more compute is added. Always? No longer. In this talk, I would go over the overview of Cerebras Wafer-Scale Cluster which involved the fundamental redesigning of chips, systems, compilers, workflow scaling, and beyond. I will present a cluster of 16 Cerebras CS-2 nodes that achieves near-perfect linear scaling across more cores than the world’s most powerful supercomputer with nearly 13 million AI cores.

Prashanth Thinakaran
MTS, Cerebras Systems

Bio: Prashanth holds a Ph.D in Computer Science and Engineering from Penn State, and his research focused on systems aspects of high performance and cloud computing. He has authored several conference papers and a book chapter in the area. He is currently working for Cerebras systems as AI Cluster Infrastructure Engineer. He develops the systems that enable large-scale AI model training on Cerebras's Wafer-scale Cluster. This system recently made news to have trained the largest AI model on a single device and won the ACM Gordon Bell special prize for HPC COVID Research. News: https://www.cerebras.net/company/news/  

Contact information: prashanth.thina@gmail.com

15:10 - 15:40
30 Minutes
Session 3
Near perfect AI scaling on 13 Million cores: Cerebras Architectural Overview
Prashanth Thinakaran
Cerebras

Abstract: From edge to AI and HPC, computer architectures are becoming more heterogeneous and complex. The systems typically have fat nodes, with multicore CPUs and multiple hardware accelerators such as GPUs, FPGAs, and DSPs. This complexity is causing a crisis in programming systems and performance portability. Several programming systems are working to address these challenges, but the increasing architectural diversity is forcing software stacks and applications to be specialized for each architecture, resulting in poor portability and productivity. This talk argues that a more agile, proactive, and intelligent runtime system is essential to increase performance portability and improve user productivity. To this end, this talk introduces a new runtime system called IRIS. IRIS enables programmers to write portable and flexible programs across diverse heterogeneous architectures for different application domains from embedded/mobile computing to AI and HPC computing, by orchestrating multiple programming platforms in a single execution and programming environment.

Seyong Lee
Senior R&D Staff, Programming Systems Group @ ORNL

Bio: Seyong Lee is a Senior R&D Staff in Computer Science and Mathematics Division at Oak Ridge National Laboratory. His research interests include parallel programming and performance optimization in heterogeneous computing environments, program analysis, and optimizing compilers. He received his PhD in Electrical and Computer Engineering from Purdue University, USA. He is a member of the OpenACC Technical Committee and a former member of the Exascale Computing Project PathForward Working Group. He served as a program committee/guest editor/external reviewer for various conferences, journals, and proposals. His SC10 paper won the best student paper award, and his PPoPP09 paper was selected as the most cited paper among all papers published in PPoPP between 2009 and 2014. He received the IEEE Computer Society TCHPC Award for Excellence for Early Career Researchers in High Performance Computing at SC16 and served as an award committee member for 2017 IEEE CS TCHPC Award.  

Contact information: lees2@ornl.gov http://ft.ornl.gov/~lees2/

15:40 - 16:10
30 Minutes
Session 3
IRIS: A Portable Programming Framework for Extremely Heterogeneous Computing
Seyong Lee
ORNL
16:10 - 16:45
35 Minutes
Socials
Virtual social event (rotating 5 min breakout rooms)

Abstract: Deep learning technology has made significant progress on various cognitive tasks, once believed impossible for computers to do well as humans, including image classification, object detection, speech recognition, and natural language processing. However, the vast adaptation of deep learning also highlights its shortcomings, such as limited generalizability and lack of interpretability. In addition, application-specific deep learning models require lots of manually annotated training samples with sophisticated learning schemes. Witnessing the performance saturation of early models such as MLP, CNN, and RNN, one notable recent innovation in deep learning architecture is the transformer model introduced in 2017. It has two good properties towards artificial general intelligence over conventional models. First, the performance of transformer models continues to grow with their model sizes and training data. Second, transformers can be pre-trained with tons of unlabeled data either through unsupervised or self-supervised learning and can be fine-tuned quickly for each application.  In this talk, I will present a multi-FPGA acceleration appliance named DFX for accelerating hyperscale transformer-based AI models. Optimized for OpenAI’s GPT (Generative Pre-trained Transformer) models, it manages to execute an end-to-end inference with low latency and high throughput. DFX uses model parallelism and optimized dataflow that is model-and-hardware-aware for fast simultaneous workload execution among multiple devices. Its compute cores operate on custom instructions and support entire GPT operations including multi-head attentions, layer normalization, token embedding, and LM head. We implement the proposed hardware architecture on four Xilinx Alveo U280 FPGAs and utilize all of the channels of the high bandwidth memory (HBM) and the maximum number of compute resources for high hardware efficiency. Finally, DFX achieves 5.58× speedup and 3.99× energy efficiency over four NVIDIA V100 GPUs on the modern GPT-2 model. DFX is also 8.21× more cost-effective than the GPU appliance, suggesting that it can be a promising alternative in cloud datacenters.

Joo-Young Kim
Assistant Professor, School of EE, KAIST

Bio: Joo-Young Kim received the B.S., M.S., and Ph. D degree in Electrical Engineering from Korea Advanced Institute of Science and Technology (KAIST), in 2005, 2007, and 2010, respectively. He is currently an Assistant Professor in the School of Electrical Engineering at KAIST. He is also the Director of AI Semiconductor Systems (AISS) research center. His research interests span various aspects of hardware design including VLSI design, computer architecture, FPGA, domain specific accelerators, hardware/software co-design, and agile hardware development. Before joining KAIST, Joo-Young was a Senior Hardware Engineering Lead at Microsoft Azure working on hardware acceleration for its hyper-scale big data analytics platform named Azure Data Lake. Before that, he was one of the initial members of Catapult project at Microsoft Research, where he deployed a fabric of FPGAs in datacenters to accelerate critical cloud services such as machine learning, data storage, and networking. Joo-Young is a recipient of the 2016 IEEE Micro Top Picks Award, the 2014 IEEE Micro Top Picks Award, the 2010 DAC/ISSCC Student Design Contest Award, the 2008 DAC/ISSCC Student Design Contest Award, and the 2006 A-SSCC Student Design Contest Award. He serves as Associate Editor for the IEEE Transactions on Circuits and Systems I: Regular Papers (2020-2021).        
Contact Information: https://castlab.kaist.ac.kr/our-team/joo-young-kim/ jooyoung1203@kaist.ac.kr

10:00 - 10:30
30 Minutes
Session 4
A Multi-FPGA Appliance for Accelerating Inference of Hyperscale Transformer Models
Joo-Young
KAIST

Abstract: This talk will go over different AutoML methods that share the common themes of efficiency and hardware-awareness. We will present (1) a fast predictor-based search algorithm, (2) zero-cost NAS proxies that significantly speed up the evaluation phase of NAS, (3) accurate hardware latency prediction in NAS, (4) automated hardware-DNN codesign, and (5) real application case studies by which NAS delivered significant improvements. Our projects are all parts of an effort to enable on-device deployment of DNN within constrained hardware devices--an area in which AutoML can play a big role.

Mohamed Abdelfattah
Assistant Professor, Cornell Tech - Cornell University

Bio: Mohamed Abdelfattah is an Assistant Professor at Cornell Tech and in the Electrical and Computer Engineering Department at Cornell University. His research group is designing the next generation of machine-learning-centric computer systems for both datacenters and mobile devices. He received his BSc from the German University in Cairo, his MSc from the University of Stuttgart, and his PhD from the University of Toronto. After his PhD, Mohamed spent five years at Intel and Samsung Research.


Contact Information: https://www.mohsaied.com/ mohamed@cornell.edu

10:30 - 11:00
30 Minutes
Session 4
Neural Architecture Search
Mohamed Abdelfattah
Cornell

Abstract: Hardware architectures are undergoing major shifts due to slowing of Moore's law. Innovations in packaging technology has given rise to chiplet based architectures which we are seeing in CPUs and GPUs. In addition, we will in near future, see integration of heterogeneous chiplet modules on the same compute package thereby allowing a mix of accelerated workloads on the same architecture without offloading to PCIe attached accelerators. In addition, thanks to coherent fabrics like CXL, we will see more tighter integration between coherent accelerators and processors. This might also allow dis-aggregation of memory and storage at rack level. CXL will be an enabling technology for CPU-memory and CPU-accelerator allowing for disaggregation into fine grained resource pools. It will allow cloud vendors to provide precisely sized compute block for user workloads. Silicon photonics and in package optics will be able to ensure that latency induced by this dis-aggregated architecture is within tolerable limits and does not come at price of performance of latency sensitive application for most cases. All these hardware stack innovations will have massive impact on future accelerators and their development. It will be important to look at these underlying trends in hardware design and what they offer so that software can leverage performance gains accordingly. Architects will need to leverage the hardware and get performance with low overheads. The aim of the talk is to look at CXL enabled switch to pool memory, accelerator and storage with compute elements to provide customers economical and performance-oriented cloud infrastructure on demand. In addition, we aim to build a co designed serverless based API to provision these units and provide a cost-efficient experience to users.

Gaurav Kaul
HPe

Bio: Gaurav Kaul is a senior systems architect in Hewlett Packard Enterprise (HPE) where he leads HPC and AI systems design for large customers in EMEA region including pre-exascale machines like Archer2 (University of Edinburgh), LUMI (Finland) and Shaheen (Saudi Arabia). His work involves working with customers and understanding the workloads, hardware-software co-design on upcoming generations of accelerators and CPUs from AMD, Intel and Nvidia and onboarding users by sharing best practices and knowledge transfer. In addition to his role in HPE, Gaurav is involved in various standards like OCP, CXL, UCIe and MLIR for HPC and AI systems design. Prior to working in HPE, he has worked in AWS, Intel and IBM in various systems related domains and processor design. He holds a Masters in Computer Science from University of Manchester and lives in London, UK with his family.

 
Contact information: gaurav.kaul@hpe.com  

11:00 - 11:30
30 Minutes
Session 4
Impact of Chip and System Level Disaggregation – A Hardware-Software Codesign Approach
Gaurav Kaul
HPE

Abstract: For programming FPGA-based accelerators, high level synthesis (HLS) is the mainstream approach. Unfortunately, HLS leaves a significant programmability gap in terms of reconfigurability, customization and versatility: 1. FPGA physical design can take hours, 2. FPGA reconfiguration time limits HLS from targeting complex workloads, and 3. HLS tools do not reason about cross-workload flexibility. Overlay approaches mitigate the above by mapping programmable designs (e.g. CPU, GPU, etc.) on top of FPGAs. However, the abstraction gap between overlay and FPGA leads to low efficiency/utilization.Our work develops a new FPGA programming paradigm, where an overlay architecture is automatically specialized to a set of representative applications. The key innovation is a highly-customizable overlay design space based on spatial architectures, which encompass a range of designs from application-specific to general purpose. We leverage and extend prior work on accelerator compilers, SoC generation, and fast design space exploration (DSE) to create an end-to-end FPGA acceleration system called OverGen. OverGen can compete in performance with state-of-the-art HLS techniques, while requiring 10,000x less compile time and reconfiguration time.

Tony Nowatzki
Associate Professor, UCLA

Bio: Tony Nowatzki is an associate professor in the Computer Science Department at the University of California, Los Angeles, where he leads the PolyArch Research Group.  He joined UCLA in 2017 after completing his PhD at the University of Wisconsin - Madison.  He was also a consultant for Simple Machines Inc., an AI hardware startup that used several of his patents in fabricated chips. Academic recognition includes four IEEE Micro Top Picks awards, a CACM Research Highlights, best paper nominations at MICRO and HPCA, and a PLDI Distinguished Paper Award.
 

Contact information: https://web.cs.ucla.edu/~tjn/ tjn@cs.ucla.edu  

11:30 - 12:00
30 Minutes
Session 4
11:30-12:00 Overlay Generation: A New Paradigm for Productive FPGA Acceleration Tony Nowatzki, UCLA
Tony Nowatzki
UCLA
12:00-13:00
60 Minutes
Lunch
Lunch break

Abstract: In this project, we set out to find innovative ways to improve inference performance on CPU for better resource utilization. By utilizing sparse multiplication libraries and vendor provided optimized libraries, we can notably improve performance in ML inference

Jared Baumann
Flapmax

Bio: Jared Baumann is a dual major graduate from TSU who has been working with Flapmax for a little over a year. He specializes in the development of low-level solutions for improving performance.

Contact information: jared@flapmax.com

13:00 - 13:30
30 Minutes
Session 5
Optimizing Machine Learning Inference performance on CPU
Jared Baumann
Flapmax

Abstract: Abstract: Entrepreneurs, researchers and AI practitioners that are climate-conscious (and receive monthly electricity bills) and in the search for a modern IT infrastructure to build their solutions need look no further. IBM Power server hardware paired with a RedHat OpenShift container software stack that stays “on” (with 99.999% availability) and a hardware-based AI accelerator (on-chip Matrix Math Accelerator) maintains a lower energy-footprint and Total Cost of Ownership (TCO) compared to x86 servers. Learn about the IBM Power servers, how you can leverage them to drive your AI roadmaps and benefit from the expanding ecosystem of vendors that support IBM Power at this session.

Azer Khan
GTM Leader, Banking & Industry Modernization, IBM

Bio: Yihui, a.k.a. "Ray", works in the general area of Artificial Intelligence (AI), its applications in science and its interaction with novel hardware. Ray's current research topics include unpaired image translation to bridge the gap between simulation and experiments, neural network optimization and deployment for real-time systems, novel hardware exploration and benchmarking, privacy-preserving AI, and bringing advanced AI methods to scientific domains.

Contact Information: https://www.bnl.gov/staff/yren yren@bnl.gov

13:30 - 14:00
30 Minutes
Session 5
How entrepreneurs can benefit from sustainable IBM Power servers (with built in AI-accelerators) to build OpenShift-based IT infrastructures
Azer Khan
IBM

Abstract: In the space of hardware acceleration alternatives, FPGAs lie in the middle of the programmability-efficiency spectrum, with GPUs being more programmable and ASICs being more efficient. FPGAs provide massive parallelism and are reconfigurable, which makes them very well suited for the fast-changing needs of DL applications. But how can we minimize the gap between ASICs and FPGAs in terms of performance and efficiency, while retaining their strength - the reconfigurability? This talk will dive into our research that attempts to answer this question by exploring better reconfigurable fabrics for Deep Learning. We will discuss how FPGAs are evolving into domain-specific reconfigurable fabrics. Specifically, we will look at new blocks called Tensor Slices and CoMeFa RAMs. These blocks are a significant step towards closing the performance gap between FPGAs and ASICs. In this talk, we will take a peek into the architecture of these blocks and talk about the performance improvement and energy reduction that can be obtained for DL application by using modern FPGAs containing these blocks.

Aman Arora
Ph.D. Fellow, UT Austin

Bio: Aman Arora is a PhD candidate and Graduate Fellow at The University of Texas at Austin. His research focuses on optimizing FPGA architecture to make them better Deep Learning accelerators. He has over 12 years of experience in the semiconductor industry in design, verification, testing and architecture roles. He is in the job market for a faculty job starting next Fall.  

Contact information: https://amanarora.site aman.kbm@utexas.edu

14:00 - 14:30
30 Minutes
Session 5
In search for the right reconfigurable fabric for DL
Aman Arora
UT Austin

Abstract: HPC developers/users in scientific domains such as climate modeling, CAD for Manufacturing, CFD, molecular dynamics, histopathology, seismology, protein folding, high energy physics, astrophysics etc. are exploring ways to use AI models to augment and accelerate or develop AI solutions for HPC simulations. Their problems typically have extremely large, multi-dimensional input data that are unlike those used in popular DL domains of image recognition, recommendation engine, language and text translations etc. This leads to significantly large memory usage and I/O ingestion challenges in both the input data pipeline and end-to-end HPC-AI workload pipelines. The talk will feature Intel’s best practices for optimizing HPC/AI workloads including, code optimizations using Intel Optimized TensorFlow and Pytorch using Intel extensions for training and Inference. These optimizations results in up to 3-4X improvement over Intel’s 3rd Gen Scalable Processor using AVX-512 using Intel’s 4th Gen Processor code named Sapphire Rapids with HBM using AMX/TMUL instructions supporting mixed-precision FP32 and BFloat16 for Training and quantized Inference.

Nalini Kumar
Intel

Bio:  Nalini Kumar works on HPC/AI workload optimization, analysis, and modeling at Intel in Santa Clara. Her primary research interests are in applying parallel, high-performance, and reconfigurable computing to traditional HPC as well as AI workloads. She is also interested in performance modeling and prediction of full applications and workflows on large-scale systems. She received her PhD and MS from in Electrical and Computer Engineering from University of Florida.  

Contact information: nalini.kumar@intel.com  

14:30 - 15:00
30 Minutes
Session 5
HPC AI Workload Best Practices on Next Gen Intel® Xeon® Max processors (codenamed Sapphire Rapids)
Nalini Kumar
Intel
15:00 - 15:10
10 Minutes
Break
Coffee Break

Abstract: From edge to AI and HPC, computer architectures are becoming more heterogeneous and complex. The systems typically have fat nodes, with multicore CPUs and multiple hardware accelerators such as GPUs, FPGAs, and DSPs. This complexity is causing a crisis in programming systems and performance portability. Several programming systems are working to address these challenges, but the increasing architectural diversity is forcing software stacks and applications to be specialized for each architecture, resulting in poor portability and productivity. This talk argues that a more agile, proactive, and intelligent runtime system is essential to increase performance portability and improve user productivity. To this end, this talk introduces a new runtime system called IRIS. IRIS enables programmers to write portable and flexible programs across diverse heterogeneous architectures for different application domains from embedded/mobile computing to AI and HPC computing, by orchestrating multiple programming platforms in a single execution and programming environment.

Seyong Lee
Senior R&D Staff, Programming Systems Group @ ORNL

Bio: Seyong Lee is a Senior R&D Staff in Computer Science and Mathematics Division at Oak Ridge National Laboratory. His research interests include parallel programming and performance optimization in heterogeneous computing environments, program analysis, and optimizing compilers. He received his PhD in Electrical and Computer Engineering from Purdue University, USA. He is a member of the OpenACC Technical Committee and a former member of the Exascale Computing Project PathForward Working Group. He served as a program committee/guest editor/external reviewer for various conferences, journals, and proposals. His SC10 paper won the best student paper award, and his PPoPP09 paper was selected as the most cited paper among all papers published in PPoPP between 2009 and 2014. He received the IEEE Computer Society TCHPC Award for Excellence for Early Career Researchers in High Performance Computing at SC16 and served as an award committee member for 2017 IEEE CS TCHPC Award.  

Contact information: lees2@ornl.gov http://ft.ornl.gov/~lees2/

15:40 - 16:10
30 Minutes
Session 3
IRIS: A Portable Programming Framework for Extremely Heterogeneous Computing
Seyong Lee
ORNL
15:10 - 15:55
45 Minutes
Panel Session
Panel Session: Pathway to Scale-Out and Scale-Up AI
MODERATOR
Yi-Chung Chen
MediaTek
PANELISTS
Chris Kachris
InAccel
Azer Khan
IBM
Seyong Lee
ORNL
Dave Ojika
Flapmax

Startups

KaCyber

Making seamless mobility a reality in Africa

Legit Car

Building Africa’s biggest vehicle data service

SnarkHealth

Partner with your doctor. Leverage your data. Pay less.

Capsa

The Amazon for Invoices in Africa

Sectors

Industrial
Health
Education
Sustainablity
Agriculture
FinTech

Partners