# Lecture 1 — Introduction

Stanford CS343D (Winter 2023) Fred Kjolstad

## Course staff



### Fred Kjolstad



### AJ Root

2

### Administria

- Syllabus at <u>https://cs343d.github.io</u>
- Discussion will happen through Ed in Canvas
- Office Hours
  - Fred: Monday 10–11am in Gates 486

AJ: Thursday 2-3pm in Gates 4A common area

3

# Goals of the Course

- Introduce you to domain-specific and collectionoriented programming languages from the past
- Introduce you to compiler techniques to get good performance for dense and sparse applications
- Bring you to one of the frontiers of PL and compiler research
- Get you thinking about abstractions and semantics
- "What are the three biggest ideas in computer science? Abstraction, abstraction, abstraction."
   -Paul Hudak



## Expectations

- Read papers and engage in class (25%)
  - ~2 readings per class
  - Classes will have a lecture followed by paper discussion
  - Everyone will get a chance to lead a discussion
- Two assignments (20%)
  - MiniAPL
  - Sparse Coiteration Code Generation
- Essay (15%)
- Project (40%)



### It is all about performance and productivity



productivity



### Matlab NumPy



## Performance translates to less time and less energy



Data centers



Tensor Processing Unit



### Supercomputers



Self-driving cars



Cell-phone batteries



### Era of simulation (1945–1965)

### Era of data processing (1965–1984)

### Era of Personal Computing (1984–1995)

### Era of communication (1995–2018)

### Era of interaction (2018–???)

# Eras of Computing













# Modern applications are performance hungry

### Simulation and Optimization



### **Robotics**



**Graphics Simulations** 



### Virus Modelling







Neural Networks

### **Data Analytics**



### Social Networks



### **Computational Biology**



### Kristina ★★★★☆ Great Produc March 30, 2017 olor: White Verified

★★★★★ Excellent October 25, 2017

This is a great product for your boy who loves sports! It was a good value as well. Other stores sell for 3 he cost. I bought one for a basketbal not flimsy. Will hold items nicely

\*\* I was really d Color: Black Verified s product came with a manufacture's chips in it. It is not the sellers fault but I do not know how many

### Recommender Systems

### Machine Learning



Convolutional Networks



Graph Convolutional Network



## Modern hardware is heterogeneous and programming it is hard

## 5th Gen Intel<sup>®</sup> Core<sup>™</sup> Processor Die Map 14nm 2nd Generation Tri-Gate 3-D Transistors



### Dual Core Die Shown Above

4th Gen Core Processor (U series): 1.3B \*\* Cache is shared across both cores and processor graphics 5th Gen Intel® Core™ Processor with Intel® HD Graphics 6000 or Intel® Iris™ Graphics 6100

### Transistor Count: 1.9 Billion

Die Size: 133 mm<sup>2</sup> 4th Gen Core Processor (U series): 181mm<sup>2</sup>



### Hardware in the Clouds







11

# A lot of industry activity



https://basicmi.github.io/AI-Chip/

### AI Chip Landscape S.T. V0.7 Dec., 2019 Startup Worldwide IP/Design Sevice FPGA/eFPGA arm Achronix • WSE **SYNOPSYS**<sup>®</sup> flexlogix **DEFINIX**. C Imagination Graphcore® Processing in Memory GC2 CEVA AREANNA gyrfalcon technology • Gaudi • Goya GAINBOARD/ cādence Lightspeeur HAILO **Si**Five **Optical Computing** Hailo-8 🖉 blaıze ARTERÍS LIGHTMATTER Xplorer X1000 【 LUMINOUS • MPPA2-256 grog Neuromorphic Design service with In-house IP aiCTX brainchip Tachyum? Veri Silicon GML <u>∧'</u>SR RKIN Esperanto BROADCOM<sup>®</sup> Preferred Networks MN-Core Shasta/Rainier/Tacoma GUC **PEZY** Computing AISTORM PEZY-SC2 **Meron** • KL520 alchip Eta Compute • ECM3531 🥑 NeuroBlade FARADAY Y Tenstorrent GREENWAVES • GAP8 More at https://basicmi.github.io/AI-Chip/ Benchmarks TensorRT **AI - Benchmark ΔI Matrix**. MLPerf The Tensor Algebra DAWNBench MLMark 中国人工智能产业发展联盟 Artificial Intelligence Industry Alliance **Compiler (taco)**

All information contained within this infographic is gathered from the internet and periodically updated, no guarantee is given that the information provided is correct, complete, and up-to-date.



# The Road to Point Reyes Lucasfilm 1984



### **R.E.Y.E.S = Renders Everything You Ever Saw**



surface corrode(float Ks=0.4, Ka=0.1, rough=0.25) { float i, freq=1, turb=0; // compute fractal texture for( i=0; i<6; i++ ) { turb+=1/freq\*noise(freq\*P); freq\*=2; } // perturb surface P -= turb \* normalize(N); N = faceforward(normalize(calculatenormal(P))); // compute reflection and final color Ci = Cs\*(Ka\*ambient()+Ks\*specular(N,I,rough));



# Little Languages (DSLs)

# Jon Bentley, CACM 29(8), 1986

Defining "little" is harder; it might imply that the first-time user can use this system in an hour or master the language in a day, or perhaps the first implementation took just a few days. In any case, a little language is specialized to a particular problem domain and does not include many features found in conventional languages.

# UNIX "DSLs"

- bash, csh shell programming
- awk processing strings
- sed regular expressions
- troff, pic, tbl, eqn, ...

printf formatting

. . .

# **Programming Languages**

### Productivity





# epython

# **Domain-Specific Languages**



### Productivity



# epython

# **Graphics Libraries**

glPerspective(45.0); for( ... ) {  $\bullet \bullet \bullet$ glEnd();

```
glTranslate(1.0,2.0,3.0);
glBegin(GL_TRIANGLES);
    glVertex(...);
    glVertex(...);
```

glSwapBuffers();

- <Scene> = <BeginFrame> <Camera> <World> <EndFrame>
- <Camera> = glMatrixMode(GL PROJECTION) <View>
- <View> = glPerspective | glOrtho
- <World> = <Objects>\* <Object> = <Transforms>\* <Geometry> <Geometry> = glBegin <Vertices> glEnd

### **OpenGL** "Grammar"

<Transforms> = glTranslatef | glRotatef | ... <Vertices> = [glColor] [glNormal] glVertex

## Productivity

## Graphics library is easy to use

## Portability

Runs on wide range of GPUs

### Advantages

## Advantages

Productivity

Portability

Performance

- Vertices/Fragments are independent
- Rasterization can be done in hardware
- Textures are read-only; texture filtering hw
- Specialized scheduler for pipeline

. . .

Allows for super-optimized implementations

Productivity

Portability

Performance

Encourage innovation

- architecture to achieve efficiency
- Allows vendors to introduce new low-level programming models and abstractions

### Advantages

• Allows vendors to radically optimize hardware

# **Domain-Specific Languages**



### Productivity



# net python

# **Definition: Domain-Specific**

domain knowledge for productivity and performance

Widely used in many application areas

- matlab / R
- SQL / map-reduce / Microsoft's LINQ
- TensorFlow, pytorch

Definition: A language or library that exploits





## python

### Advantages

- Add the semantics of the domain
  - High-level program transformations
- Restrict programming language
  - Less-general computations
  - Guarantee static analysis
- Known parallelization strategies
  - Someone has shown how to robustly do it

=> Tractable

# Why DSLs Work

### Moore's Law – The number of transistors on integrated circuit chips (1971-2016)

Our World in Data Moore's law describes the empirical regularity that the number of transistors on integrated circuits doubles approximately every two years. This advancement is important as other aspects of technological progress – such as processing speed or the price of electronic products – are strongly linked to Moore's law.



Data source: Wikipedia (https://en.wikipedia.org/wiki/Transistor\_count) The data visualization is available at OurWorldinData.org. There you find more visualizations and research on this topic.

Licensed under CC-BY-SA by the author Max Roser.

# A New Golden Age for Computer Architecture: Domain-Specific Hardware/Software Co-Design



John Hennessy

**2017 Turing Award** 



### **David Patterson**

# Large efficiency gains with domain-specific architectures



Source: Bob Broderson, Berkeley Wireless group

## **Domain-Specific Architectures**



### Google Tensor Processing Unit



### NVIDIA Turing Architecture

# **Collection-Oriented Languages**



Relations Relational Algebra C70,

Graphs GraphLab L10

Meshes Liszt D11



A collection-oriented programming model provides collective operations on some collection/abstract data structure

3

Grids Sejits S09, Halide







Vectors Vector Model B90

Matlab M79, taco K17







# Modern Domain-Specific Languages/Compilers





# Halide













