# 🛠 Compiler
How to write your own programming language?
- Examples
- GNU Compiler Collection - GCC
- LLVM
- Clang
- javascript Compilers - V8 Engine
- CPython - python compiler/interpreter
- Guido van Rossum - Benevolent dictator for life BDFL for Python until 2018
- Programming is mainly writing correct code for compilers. If you can program in a language means you know how to write instructions for a specific compiler.
- Programming languages differ widely because their compilers are written so.
# 🌴 Types of Compilers
- Cross-Compiler
compiles code for a target platform(CPU architecture)
- compiler for different CPU Architecture
- on x86 compiling code for ARM Architecture
- Assembler
Translate code to assembly languages
- Transpiler - S2S compiler
compiles code from one version to another. Eg: Babel in JS world
Eg: From modern ES2015 to ES5 to support older browsers
- Decompiler
compiles from a Low Level Language to High Level Language
- Bytecode Compiler
JAV, Python VM
JIT - Just in time Compilers
- Bootstrapped Compilers
- Self Compiled - compiled by itself
# 🖌 Compiler Design
- Translates code from one form to another
# 🎨 Compiler Frontend
Generally a 5 Stages Process Line. Just how a Buddha rice bown is assembles in a restaurant.
# 🏎 Canonicalization
- converting to a format more suitable to work with
- remove extra spaces etc.
Very similar to Database System Canonicalization of data to 1NF, 2NF or 3NF
# Preprocessing
- Macro Substitution
- Conditional Compilation
# 🍇 Lexical Analysis / Tokenization
very important Stage and most complex when implementing any programming language
- source code is converted to token, so that a hierarical tree could be generated
- Can happen in 2 Stages
- Scanning
- convert code blocks to leximes
- keywords
- literals
- identifiers
- references
- comments
- seperators
- operators
- convert code blocks to leximes
- Evaluating
- Attackes leximes with values so that they could be used in Parsing stage
- Token Name: Token Value Format
- Scanning
Scanning
Scanning is performed using
- RegEx
- CFG Context-free Grammer
- RegEx Languages and
- FSM Finite State Automata Schemes
- BNF, EBNF
- Intended for consumption by humans
- BNF Python
- Python Lexical Analysis
- Full Python Grammer
- Difference b/w regular grammer and Context free grammer
- How to write grammer for a programming language
- Chomsky hierarchy of grammer
- Allison Parrish
- Grammer Parsing Engines
- Grammer
Language of languages, Chomsky Hierarchy
- CFG - Context Free Grammer
Every production rules of the form
where,
is non-terminal and is terminal character 4-tuple
where,
is called a nonterminal character or a variable is a finite set of terminals, is a finite relation from to start variable
- CSG - Context Sensitive Grammar
more general than CFG
A formal grammar 📐
Notation Meaning is a set of nonterminal symbols is a set of terminal symbols is a set of production rules, and is the start symbol is context-sensitive if all rules in
are of the form CSG formal definition
where,
- 👉
- 👉
and - 👉
- 👉
# 🇩🇪 Grammar
- Formal Grammer
Finite set of production Rules
Examples, write grammer to match all a, aa, aaaa, aaaaaaa
# 🌲 Parsing / Syntax Analysis
crucial stage in Process
Outputs Parse Tree
# 💻 Symentic Analysis
Outputs Symbol Table
- Builds Type checking
- Object Binding
# 🖥 Compiler Middle End
- Optimizations
- Dead code removal
- Constant propagation
- Rechability Analysis
# 🐲 Compiler Backend
Could be reuses as in cross Compilers
- CPU Architecture level Transformation
- CPU Scheduling Flags
- CPU architecture based Optimization to utilize CPU features like
- Hypreadthreaded feature
- Multicore
- GPUs etc
- CPU Heuristics and Algorithms
# 🔎 Overview
# How compilers are used?
Compiler interface usually have a CLI interface. Works just like another CLI tool so as to speak. Take examples of python, node, rust, java, bash, zsh compilers.
# 🐮 Parser Generators
Parser check the grammer and syntax of a natural/computer language. Syntatic Analysis
- Compiler-compiler softwares
- YACC
- GNU Bison
- documentation
- Generates a parser in
C
,C++
, orjava
- Perl 5
- GNU OCtave
- MySQL, PostgresSQL
- Bash
- Go - initiall then shifted to their own custom tool
- PHP, Ruby
- ANTLR
- CPython uses ASDL
- Can generate parser from calculator to complex language
filename.l
- 4 sections
- commenting is not allowed
Bison Input Grammer File structure
%{
Prologue
%}
Bison declarations
%%
Grammar rules
%%
Epilogue