Conversion of RE to NFA
Finite Automaton (FA)
A Finite Automaton (FA) is a mathematical model of computation that acts as an abstract machine for recognizing patterns. It consists of a finite number of states and transitions between them, based on input symbols.
Its main purpose is to determine whether an input string conforms to a predefined pattern — accepting or rejecting it.
Formally, an FA is a 5-tuple:
[ M = (Q, \Sigma, \delta, q_0, F) ]
where:
- Q → Finite set of states
- Σ (Sigma) → Finite set of input symbols (alphabet)
- δ (delta) → Transition function
- DFA: (\delta : Q \times \Sigma \to Q)
- NFA: (\delta : Q \times \Sigma \to 2^Q) (set of possible next states)
- q₀ → Initial state ((q₀ \in Q))
- F → Set of accepting (final) states, (F \subseteq Q)
Applications:
Finite automata form the foundation of regular languages and are used in:
- Compiler design (lexical analysis)
- Pattern matching
- Input validation
Regular Expressions (RE)
A Regular Expression (RE) is a symbolic representation of a regular language. It defines patterns over an alphabet using operators such as union, concatenation, and repetition.
A language is regular iff it can be described by a regular expression.
Formal Definition of Regular Expressions
Let Σ be an alphabet. The set of regular expressions over Σ is defined recursively:
Base Cases
Empty set (∅):
- RE:
∅ - Language: (L(∅) = {}) (no strings)
- RE:
Empty string (ε):
- RE:
ε - Language: (L(ε) = {""})
- RE:
Single symbol (a ∈ Σ):
- RE:
a - Language: (L(a) = {"a"})
- RE:
Recursive Cases (If R₁ and R₂ are REs)
Union (R₁ | R₂):
[ L(R₁ \mid R₂) = L(R₁) \cup L(R₂) ]Concatenation (R₁R₂):
[ L(R₁R₂) = {xy \mid x \in L(R₁),; y \in L(R₂)} ]Kleene Star (R₁):*
[ L(R₁^*) = {\epsilon, w, ww, www, \dots \mid w \in L(R₁)} ]Parentheses ():
- Used for grouping, e.g., ((a \mid b)c)
Non-Deterministic Finite Automaton (NFA)
An NFA is a finite automaton where multiple transitions for the same symbol are allowed, and ε-transitions (without consuming input) are possible.
Characteristics
- Multiple transitions: From one state, there may be several outgoing transitions for the same input.
- ε-transitions: State changes can occur without reading input.
- Acceptance rule: An input is accepted if at least one computation path leads to an accepting state.
- Equivalence: Every NFA has an equivalent DFA, though conversion may cause an exponential increase in states.
Thompson’s Construction: RE → NFA
Thompson’s Construction systematically converts a regular expression into an equivalent NFA.
Key Property: Each constructed NFA has exactly one start state and one final state, which makes recursive construction simple.
Construction Rules
Base Cases
Empty set (∅):
- NFA: Start and final states with no transition.
Empty string (ε):
- NFA: Start →ε→ Final
Single symbol (a):
- NFA: Start →a→ Final
Recursive Cases
Concatenation (R₁R₂):
- Connect the final state of NFA(R₁) to the start state of NFA(R₂) using an ε-transition.
- Start state = start of R₁
- Final state = final of R₂
Union (R₁ | R₂):
- Create a new start state and a new final state.
- Add ε-transitions from the new start state to the start states of both R₁ and R₂.
- Add ε-transitions from the final states of R₁ and R₂ to the new final state.
Kleene Star (R*)
- Create a new start state and a new final state.
- Add an ε-transition from the new start to the old start state.
- Add an ε-transition from the old final state back to the old start state (loop for repetition).
- Add an ε-transition from the old final state to the new final state.
- Add an ε-transition from the new start to the new final state (for zero repetitions).