Lunes, Abril 25, 2011

Designing Finite Automata

Finite state machines

A finite state machine (FSM, also known as a deterministic finite automaton or DFA) is a way of representing a language (meaning a set of strings; we're interested in representing the set strings matching some pattern). It's explicitly algorithmic: we represent the language as the set of those strings accepted by some program. So, once you've found the right machine, you can test whether a given string matches just by running it.
The KMP algorithm works by turning the pattern it's given into a machine, and then running the machine. The hard part of KMP is finding the machine.
We need some restrictions on what we mean by "program". This is where "deterministic & finite" come from.
One way of thinking about it, is in terms of programs without any variables. All such a program can do is look at each incoming character determine what line to go to, and eventually return true or false (depending on whether it thinks the string matches or doesn't).
As a simple warmup example, let's look at a program for an easier problem: testing whether a string has an even number of characters.

main()
    {
        for (;;) {
            if (getchar() == EOF) return TRUE;
            if (getchar() == EOF) return FALSE;
        }
    }

Note the lack of variables. To simplify things, we'll rewrite programs to avoid complicated loops, and instead just use goto statements. (You've probably been taught that gotos are bad, but this sort of rewriting happens all the time, in fact every time you run a compiler it has to do this.)

main()
    {
        even:
            if (getchar() == EOF) return TRUE;
            else goto odd;

        odd:
            if (getchar() == EOF) return FALSE;
            else goto even;
    }

We've chosen labels for the goto statements, to represent what we know about the string so far (in this problem, whether we've seen an even or odd number of characters so far). Because there are no variables, we can only represent knowledge about the input in terms of where we are in the program. We think of each line in the program as being a state, representing some specific fact about the part of the string we've seen so far. Here the states are "even" and "odd", and represent what we know about the number of characters seen so far. Since there are no variables, the only thing a machine can do in a given state (state = what line the prog is on) is to go to different states, depending on what character it sees.
This can be a useful programming style; for instance I am using a program written in close to this style to filter some html files on my web pages. One advantage of this style is that there are few ways the program can be tricked into having unexpected values in its variables (since there are no variables) so it is hard to make such a program crash. But it's a little long and cumbersome, and you wouldn't want to have to compile a separate C program every time you ran "grep".
Rather than writing C code, we'll draw pictures with circles and arrows. (These pictures are known as state diagrams.) A circle will represent a state, an arrow with a label will represent that we go to that state if we see that character. (You can think of this as just being a special kind of graph.) We'll also draw an arrow from nowhere to the first state the program starts in, and arrows to nowhere if the program returns true if the string ends at that state. So our program can be represented with the following diagram.

In class I described a more complicated example, that could be used by the C preprocessor (a part of most C compilers) to tell which characters are part of comments and can be removed from the input:

It's easy to turn such a diagram into a program, that simply has one label and one case statement per state.
If we're given such a diagram, and a string, we can easily see whether the corresponding program returns true or false. Simply place a marker (such as a penny) on the initial state, and move it around one state at a time until you run out of characters. Once you run out of characters, see whether the state you're in has an "accept" arrow -- if so, the pattern matches, and if not it doesn't.
In a computer, we obviously don't represent these things with circles and arrows. Instead they can be viewed as just being a special kind of graph, and we can use any of the normal graph representations to store them.
One particularly useful representation is a transition table: we make a table with rows indexed by states, and columns indexed by possible input characters. Then simulating the machine can be done simply by looking up each new step in the table. (You also need to store separately the start and accept states.) For the machine above that tests whether a string has even length, the table might look like this:

any
                ---
    even:       odd
    odd:        even

For the C comment machine, we get a more complicated table:

/       *       EOL     other
                ---     ---     ---     ---
    none:       slash   none    none    none
    slash:      C++     C       none    none
    C++:        C++     C++     none    C++
    C:          C       star    C       C
    star:       none    star    C       C

Since a state diagram is just a kind of graph, we can use graph algorithms to find some information about finite state machines. For instance we can simplify them by eliminating unreachable states, or find the shortest path through the diagram (which corresponds to the shortest string accepted by that machine).

Automata and string matching

The examples above didn't have much to do with string matching. Let's look at one that does. Suppose we want to "grep nano". Rather than just starting to write states down, let's think about what we want them to mean. At each step, we want to store in the current state the information we need about the string seen so far. Say the string seen so far is "...stuvwxy", then we need to know two things:

Have we already matched the string we're looking for ("nano")?
If not, could we possibly be in the middle of a match?

If we're in the middle of a match, we need to know how much of "nano" we've already seen. Also, depending on the characters we haven't seen yet, there may be more than one match that we could be in the middle of -- for instance if we've just seen "...nan", then we have different matches if the next characters are "o..." or if they're "ano...". But let's be optimistic, and only remember the longest partial match. So we want our states to be partial matches to the pattern. The possible partial matches to "nano" are "", "n", "na", "nan", or (the complete match) "nano" itself. In other words, they're just the prefixes of the string. In general, if the pattern has m characters, we need m+1 states; here m=4 and there are five states.
The start and accept states are obvious: they are just the 0- and m-character prefixes. So the only thing we need to decide is what the transition table should look like. If we've just seen "...nan", and see another character "x", what state should we go to? Clearly, if x is the next character in the match (here "o"), we should go to the next longer prefix (here "nano"). And clearly, once we've seen a complete match, we just stay in that state. But suppose we see a different character, such as "a"? That means that the string so far looks like "...nana". The longest partial match we could be in is just "na". So from state "nan", we should draw an arrow labeled "a" to state "na". Note that "na" is a prefix of "nano" (so it's a state) and a suffix of "nana" (so it's a partial match consistent with what we've just seen).
In general the transition from state+character to state is the longest string that's simultanously a prefix of the original pattern and a suffix of the state+character we've just seen. This is enough to tell us what all the transitions should be. If we're looking for pattern "nano", the transition table would be

n       a       o       other
        ---     ---     ---     ---
    empty:  "n"     empty   empty   empty
    "n":    "n"     "na"    empty   empty
    "na":   "nan"   empty   empty   empty
    "nan":  "n"     "na"    "nano"  empty
    "nano": "nano"  "nano"  "nano"  "nano"

For instance the entry in row "nan" and column n says that the largest string that's simultaneously a prefix of "nano" and a suffix of "nan"+n="nann" is simply "n". We can also represent this as a state diagram:

Simulating this on the string "banananona", we get the sequence of states empty, empty, empty, "n", "na", "nan", "na", "nan", "nano", "nano", "nano". Since we end in state "nano", this string contains "nano" in it somewhere. By paying more careful attention to when we first entered state "nano", we can tell exactly where it occurs; it is also possible to modify the machine slightly and find all occurrences of the substring rather than just the first occurrence.
This description is enough to get a string matching algorithm that takes something like O(m^3 + n) time: O(m^3) to build the state table described above, and O(n) to simulate it on the input file. There are two tricky points to the KMP algorithm. First, it uses an alternate representation of the state table which takes only O(m) space (the one above could take O(m^2)). And second, it uses a complicated loop to build the whole thing in O(m) time. We'll see this algorithm next time.

Diffrence Between NFA and DFA

1.the transition function for nfa ie delta is multi valued where as for dfa it is single valued. 2.checking membership is easy with dfa where it is difficult for nfa

3.construction of nfa is very easy where for dfa it is difficult

4.space required for dfa is more where for nfa it is less.
5. In the state diagram of DFA, for every symbol of the alphabet, we specify its

one and only one state transition. But for NFA, we do not need to specify how

does the NFA react according to some symbol. In the case that the behavior of the NFA is not

specified when read some symbol, we say the NFA (or a branch of NFA) dies. NFA can specify multiple states that a state can transit to when read a symbol. (DFA can and an only can specify one next state)

6. NFA can use the symbol e, DFA can not. e means that NFA can generate a new branch without reading any input symbol

7. NFA can be understood as some machine that can reproduce itself. Or there will be multiple little machines computing at the same time, listening to the next symbol at the same time.

8. NFA will not reject the input string if one of its branch dies or reject the string. But if all of the branches of NFA die or reject the string, we say the NFA reject the string. As long as one branch of the NFA accept the string, we say that the NFA accept the string.

DFA Vs. NFA

(1)

· For Every symbol of the alphabet, there is only one state transition in DFA.

· We do not need to specify how does the NFA react according to some symbol.

(2)

· DFA can not use Empty String transition.

· NFA can use Empty String transition.

(3)

· DFA can be understood as one machine.

· NFA can be understood as multiple little machines computing at the same time.

•When processing a string in a DFA, there is

always a unique state to go next when each

character is read

•It is because for each state in DFA, there is

exactly one state that corresponds to each

character being read

•In an NFA, several choice (or no choice) may

exist for the next state

•Can move to more than 1 states, or nowhere

•Can move to a state without reading anything

        The only difference between a DFA (deterministic finite automaton) and a NFA (nondeterministic finite automaton) is found in the transition function.

        A NFA's transition function is less restrictive than a DFA's because it allows you to have several transitions from a given state to zero, one or more states for the *same input symbol*. On the other hand, a DFA specifies exactly one state that may be entered for a given state and input symbol combination.

        For example, let's say you are in state 0, then in a NFA you can have all these transitions upon reading input symbol 'a':
0 -> 1
0 -> 2
0 -> 0
You see that you have several choices as to which state you can enter from state 0 for the same input symbol 'a'.

        Contrast this to a DFA where you should have exactly one transition going from state 0 upon reading input symbol 'a' (for example 0 -> 2). The DFA is so called "deterministic" because it has one and only one transition for every possible input symbol at every state.

IMPORTANT:
          You should understand that even though a NFA may seem more powerful because it is less restrictive, this is not the case at all. It has been proven that both are equivalent and that a NFA can *always* be reduced to a DFA. The real advantage then of a NFA is that it is sometimes easier, or more natural, to build a NFA than a DFA for some problems (this NFA can thereafter be reduced to a DFA using some algorithm).

The relationship between DFA and NFA

* DFA is a kind of NFA. But the reverse is not true.

* DFA and NFA have the same capability. Every language that can be recognized

by a DFA can also be recognized by a NFA. The reverse is also true.

* Although they have the same power, usually it will be more easy to design NFA,

since we have more flexible way to describe NFA.

* Both DFA and NFA can only has one start state. They can have multiple accept

state.

Formally, an automaton is made up of: <Sigma,Q, Q0, delta, F> were delta is the transition function. In a DFA, delta takes as input a state and letter and returns only one state. In an NFA, delta takes as input a state and letter but returns a set of states.

An NFA accepts a word iff there exists a run of the automaton on it (intuitively, the automaton guesses an accepting run). A DFA has only one run on every word and therefore accepts a word iff the single run on it is accepting.

Lunes, Abril 4, 2011

Formal Definition of Finite Automata

Discrete Mathematics/Finite state automata

Formally, a Deterministc Finite Automaton is a 5-tuple

D = (Q,Σ,δ, s, F)

where:
Q is the set of all states.

Σ

is the alphabet being considered.

δ

is the set of transitions, including exactly one transition per element of the alphabet per state.

s

is the single starting state.
F is the set of all accepting states.
Similarly, the formal definition of a Nondeterministic Finite Automaton is a 5-tuple

N = (Q,Σ,δ, s, F)

where:
Q is the set of all states.

Σ

is the alphabet being considered.

δ

is the set of transitions, with epsilon transitions and any number of transitions for any particular input on every state.

s

is the single starting state.
F is the set of all accepting states.
Note that for both a NFA and a DFA,

s

is not a set of states. Rather, it is a single state, as neither can begin at more than one state. However, a NFA can achieve similar function by adding a new starting state and epsilon-transitioning to all desired starting states.
The difference between a DFA and an NFA being the delta-transitions are allowed to contain epsilon-jumps(transitions on no input), unions of transitions on the same input, and no transition for any elements in the alphabet.
For any NFA

N

, there exists a DFA

D

such that

L (N) = L (D)

A deterministic finite automaton M is a 5-tuple, (Q, Σ, δ, q₀, F), consisting of

a finite set of states (Q)
a finite set of input symbols called the alphabet (Σ)
a transition function (δ : Q × Σ → Q)
a start state (q₀ ∈ Q)
a set of accept states (F ⊆ Q)

Let w = a₁a₂ ... a_n be a string over the alphabet Σ. The automaton M accepts the string w if a sequence of states, r₀,r₁, ..., r_n, exists in Q with the following conditions:

r₀ = q₀
r_i+1 = δ(r_i, a_i+1), for i = 0, ..., n−1
r_n ∈ F.

In words, the first condition says that the machine starts in the start state q₀. The second condition says that given each character of string w, the machine will transition from state to state according to the transition function δ. The last condition says that the machine accepts w if the last input of w causes the machine to halt in one of the accepting states. Otherwise, it is said that the automaton rejects the string. The set of strings M accepts is the language recognized by M and this language is denoted by L(M).
A deterministic finite automaton without accept states and without a starting state is known as a transition system or semiautomaton.

Markov Chains

A Markov chain is a sequence of random values whose probabilities at a time interval depends upon the value of the number at the previous time. A simple example is the nonreturning random walk, where the walkers are restricted to not go back to the location just previously visited.
The controlling factor in a Markov chain is the transition probability, it is a conditional probabilty for the system to go to a particular new state, given the current state of the system. For many problems, such as simulated annealing, the Markov chain obtains the much desired importance sampling. This means that we get fairly efficient estimates if we can determine the proper transition probabilities.
Markov chains can be used to solve a very useful class of problems in a rather remarkable way. We will illustrate with the following problem: suppose we wanted to find the value of the vector x that is the solution to,

where the

matrix A, and the vector f are known. By setting up a random walk through the matrix A we can solve for any single component of x.
A little mathematics is needed to see how this would work. First lets symbolically solve (15),

This can be expanded to,

Now lets suppose we have an

matrix of probabilities, P, such that,

and we have an array,

further we will define,

P can then describe a Markov chain where the states of the chain are n integers. The element

gives the transition probability for the random walk to go from state i to state j. As long as g is not zero the walk will eventually terminate. The probability that the walk will terminate after state i is given by

.
While taking the random walk we need to accumulate the product,

and the sum,

The final W value is important because, its mean value (averaged over the walks that start at index i) is,

Notice that the final form of (23) is exactly the ith element from equation (17).
So to solve this problem we have three major steps:

Set up the probabilities p and g and start off the system at the index at which we want to solve for x, lets call that index i.
Then we take a random walk until the walk terminates, accumulating the product V and the sum W.
Then we take the average of the W values over several walks to obtain our estimate of .

This will work as long as equation (17) converges, this will happen if the norm of A,

is less than one (the smaller

is the faster the Monte Carlo estimate will converge). If the norm is larger than one, all is not lost, there is usually some manipulation that can be done to get a new matrix that has a small norm.
It turns out we can use this idea for all sorts of problems that have the same general form as (15). If we write (15) as,

and now consider A to be any linear operator that can operate on x, not just a matrix multiply. Given the appropriate operator for a given problem, we can use the above method to solve several kinds of problems. We can do a matrix inverse, i.e. solve

if we let A = I - H. Starting out at index i, will give us row i of

.
If we restrict the chains to start at index i and end at index j, then we obtain a single element of the inverse,

. Other problems that can be solved this way include the determination of eigenvalues and eigenvectors, and integral equations of the second kind such as,

Notice that equation (27) has the same kind of form as equation (25), (integration is a linear operator). If we made a discrete grid upon which we wanted to solve (27) then we could use exactly the same code that we used to solve equation (15). However in a practical application the dimension of equation (27) would be extremely large, or

would be so complicated to calculate that it is not really practical to create a giant matrix to approximate the integral. Instead we free up our random walk to apply continuously within the range

. Then the system is solved with a program that otherwise looks very much like the one to solve the matrix problem.

State diagram

UML Elements: State Diagram

The state diagram is used as a symbol for finite state machines. It may also be used to represent state transition tables. Of the 13 diagrams available in UML 2.0, the state diagram has some of the most variations. In addition to coming in different forms, it may also use various types of semantics. The traditional form that is used for the state diagram is the directed graph.

The directed graph may come with a number of different elements, and some of these are States Q, Input symbols, Output symbols Z, Edges, Start state qo, and Accepting state(s) F. The States Q is a finite coupling of vertices that will normally be symbolized by circles, and it may also be labeled with a special designation symbol. Words may also be written inside it. The Input symbols are a finite group of input symbols that may also be designators. For example, if you're dealing with a Moore machine, or a deterministic finite state machine, the input may be defined on every edge, and this will typically be near the first state.

Output symbols Z is a finite group of symbols or designators that are used for output. If you are working with a Mealy machine, both the input and output will be defined by every edge. If you are working with a Moore machine, the state of the output will generally be written within the circle of the state, and it may be split from the designator of the state with a slash symbol "/". The Edges will be used to symbolize the transitions that occur between two states caused by the input. They will be identified by their symbols which are drawn on the edges. An edge will generally be drawn as an arrow that points from the present state in the direction of another state.

Start State qo and Accepting state(s) F

The start state qo will generally be represented by the arrow which will "point to it from nowhere." It must be noted that the start state will not be shown, and it will need to be defined from the text. The Accepting state(s) F is a group of double circles which will be used to define accept states. In some cases, the accept state(s) will function as the "Final" states. Below are some examples which further explain this diagram and the elements which comprise it. If you're working with a Moore machine, every edge will be labeled along with the input. When you work with a Moore machine, you may be presented with both accept states and states.

If you're working with a Mealy machine, the edge may be labeled in various ways, and you will have both input and output. The labels for each edge may vary depending on the situation. Another concept that you will want to become familiar with is the Harel statechart. These charts were created by David Harel in the 1980s, and they have become quite popular since they were added to the Unified Modeling Language.

This diagram allows for the existence of model superstates, and it also supports concurrent state diagrams. The goal is to model activities in a way that makes them connected to the state. When you work with classic state diagrams, the machine will only be allowed to exist in one state.

When you use Harel statecharts, it will be possible to model the "and" machines, and this will allow the machine to exist in two states simultaneously. One reason this works is due to the modeling of the superstate, as well as the modeling of the concurrent machines. The UML state diagram is another important element that you will need to know. It is one of the many diagrams which make up UML 2.0, and it is closely connected to the Harel statechart in a number of novel ways. Understanding the importance of the state diagram is crucial in the function of the Unified Modeling Language

State Diagram Explained

The state diagram is basically a Harel statechart that uses a standard notation. It can be used to describe a number of things, and this includes business processes as well as computer applications.

There are a number of notational elements which make up this diagram, and these are the filled circle, the hollow circle, the rounded rectangle, the arrow, and the thick horizontal line. The filled circle may be used to point to the initial state. The hollow circle may contain a smaller circle which is filled, and this will indicate the final state is there is any. The rounded rectangle will be used to define a state. The top of this rectangle will have the name of this state, and it may have a horizontal line in its center.

Below the horizontal line at the center, activities which are done in the state may be listed. The arrow will be used to represent a transition. The name of an event may be listed, and a guard expression may also be used. If the guard expression is used, it will be enclosed within brackets. When this is done, it means that the expression will be true if the transition is to occur. If any action is carried out during this transition, it may be labeled with the "/" symbol. A thick horizontal line with a "1" or "x>1" lines will be used to represent either a join or a fork.

Another concept that you may frequently encounter in UML are extensions. One thing that many developers do is allow the arcs to flow from a certain number of states to another group of states. This will only be useful in situations where the system is allowed to be in various states simultaneously, and this implies that the individual states will only define a condition or another part of the global state.

This formulation is known as the Petri net. There are additional extensions which will allow you to integrate your flowcharts inside the Harel state charts. When you use this extension, you can develop software that is driven by both events and workflow.

State machine diagram

State machine diagrams specify state machines. This clause outlines the graphic elements that may be shown in state machine diagrams, and provides cross references where detailed information about the semantics and concrete notation for each element can be found. It also furnishes examples that illustrate how the graphic elements can be assembled into diagrams.

Biyernes, Abril 1, 2011

Regular Expressions

In computing, a regular expression, also referred to as regex or regexp, provides a concise and flexible means for matching stringsof text, such as particular characters, words, or patterns of characters. A regular expression is written in a formal language that can be interpreted by a regular expression processor, a program that either serves as a parser generator or examines text and identifies parts that match the provided specification.

The following examples illustrate a few specifications that could be expressed in a regular expression:

The sequence of characters "car" appearing consecutively in any context, such as in "car", "cartoon", or "bicarbonate"
The sequence of characters "car" occurring in that order with other characters between them, such as in "Icelander" or "chandler"
The word "car" when it appears as an isolated word
The word "car" when preceded by the word "blue" or "red"
The word "car" when not preceded by the word "motor"
A dollar sign immediately followed by one or more digits, and then optionally a period and exactly two more digits (for example, "$100" or "$245.99").

Regular expressions can be much more complex than these examples.

Regular expressions are used by many text editors, utilities, and programming languages to search and manipulate text based onpatterns. Some of these languages, including Perl, Ruby, Awk, and Tcl, have fully integrated regular expressions into the syntax of the core language itself. Others like C, C++, .NET, Java, and Python instead provide access to regular expressions only through libraries. Utilities provided by Unix distributions—including the editor ed and the filter grep—were the first to popularize the concept of regular expressions.

As an example of the syntax, the regular expression \bex can be used to search for all instances of the string "ex" that occur after "word boundaries" (signified by the \b). Thus \bex will find the matching string "ex" in two possible locations, (1) at the beginning of words, and (2) between two characters in a string, where one is a word character and the other is not a word character. For instance, in the string "Texts for experts", \bex matches the "ex" in "experts" but not in "Texts" (because the "ex" occurs inside a word and not immediately after a word boundary).

Many modern computing systems provide wildcard characters in matching filenames from a file system. This is a core capability of many command-line shells and is also known as globbing. Wildcards differ from regular expressions in generally expressing only limited forms of patterns.

A regular expression, often called a pattern, is an expression that describes a set of strings. They are usually used to give a concise description of a set, without having to list all elements. For example, the set containing the three strings "Handel", "Händel", and "Haendel" can be described by the pattern H(ä|ae?)ndel (or alternatively, it is said that the pattern matches each of the three strings). In most formalisms, if there is any regex that matches a particular set then there is an infinite number of such expressions. Most formalisms provide the following operations to construct regular expressions.

Boolean "or"

A vertical bar separates alternatives. For example, gray|grey can match "gray" or "grey".

Grouping

Parentheses are used to define the scope and precedence of the operators (among other uses). For example, gray|grey andgr(a|e)y are equivalent patterns which both describe the set of "gray" and "grey".

Quantification
A quantifier after a token (such as a character) or group specifies how often that preceding element is allowed to occur. The most common quantifiers are the question mark ?, the asterisk * (derived from the Kleene star), and the plus sign + (Kleene cross).

?
The question mark indicates there is zero or one of the preceding element. For example, colou?r matches both "color" and "colour".

*
The asterisk indicates there are zero or more of the preceding element. For example, abc matches "ac", "abc", "abbc", "abbbc", and so on.

+
The plus sign indicates that there is one or more* of the preceding element. For example, ab+c matches "abc", "abbc", "abbbc", and so on, but not "ac".
These constructions can be combined to form arbitrarily complex expressions, much like one can construct arithmetical expressions from numbers and the operations +, −, ×, and ÷. For example, H(ae?|ä)ndel and H(a|ae|ä)ndel are both valid patterns which match the same strings as the earlier example, H(ä|ae?)ndel.
The precise syntax for regular expressions varies among tools and with context; more detail is given in the Syntax section.

The Structure of a Regular Expression

   There are three important parts to a regular expression. Anchors are used to specify the position of the pattern in relation to a line of text. Character Setsmatch one or more characters in a single position. Modifiers specify how many times the previous character set is repeated. A simple example that demonstrates all three parts is the regular expression "^#*". The up arrow is an anchor that indicates the beginning of the line. The character "#" is a simple character set that matches the single character "#". The asterisk is a modifier. In a regular expression it specifies that the previous character set can appear any number of times, including zero. This is a useless regular expression, as you will see shortly.
   There are also two types of regular expressions: the "Basic" regular expression, and the "extended" regular expression. A few utilities like awk and egrep use the extended expression. Most use the "regular" regular expression. From now on, if I talk about a "regular expression," it describes a feature in both types.
   Here is a table of the Solaris (around 1991) commands that allow you to specify regular expressions:

Utility	Regular Expression Type
vi	Basic
sed	Basic
grep	Basic
csplit	Basic
dbx	Basic
dbxtool	Basic
more	Basic
ed	Basic
expr	Basic
lex	Basic
pg	Basic
nl	Basic
rdist	Basic
awk	Extended
nawk	Extended
egrep	Extended
EMACS	EMACS Regular Expressions
PERL	PERL Regular Expressions

Non-deterministic Finite Automata

A nondeterministic finite automata is a quintuple <Q,å,d,q0,F>; where:

Q is a finite set of states;
å is a finite set of input symbols;
d is the, possibly partial, transition function
d:A transition function that takes a state in Q and an input symbol in I as arguments and returns a subset of Q.
q0 element of Q is called the initial state;
F contained in Q is called the set of final states.

The above definition states that a nondeterministic finite automaton can exhibit several different transition sequences for a given state and a given input.

An input string is accepted by a nondeterministic finite automaton if and only if at least one of the possible transition sequences, defined by the input, leads to a final state.

As language recognizer devices, nondeterministic automata are not more powerful than their deterministic counterpart, indeed they accept the same class of languages, that is to say they are equivalent. However, they are very often more convenient to use. In many cases, the most direct formalization of a problem is in terms of a nondeterministic automaton.

Examples:

Q) Construct an NFA to accept all strings terminating in 01 Ans1

Q) Construct an NFA to accept those strings containing three consecutive zeroes Ans:2

Ans:1

Ans:2

In the theory of computation, a nondeterministic finite state machine or nondeterministic finite automaton (NFA) is a finite state machine where for each pair of state and input symbol there may be several possible next states. This distinguishes it from thedeterministic finite automaton (DFA), where the next possible state is uniquely determined. Although the DFA and NFA have distinct definitions, it may be shown in the formal theory that they are equivalent, in that, for any given NFA, one may construct an equivalent DFA, and vice-versa: this is the powerset construction. Both types of automata recognize only regular languages. Non-deterministic finite state machines are sometimes studied by the name subshifts of finite type. Non-deterministic finite state machines are generalized by probabilistic automata, which assign a probability to each state transition.

Formal Definition

Two similar types of NFAs are commonly defined: the NFA and the NFA with ε-moves. The ordinary is defined as a 5-tuple, (Q, Σ, T, q₀,F), consisting of

a finite set of states Q
a finite set of input symbols Σ
a transition function T : Q × Σ → P(Q).
an initial (or start) state q₀ ∈ Q
a set of states F distinguished as accepting (or final) states F ⊆ Q.

Here, P(Q) denotes the power set of Q. The NFA with ε-moves (also sometimes called NFA-epsilon or NFA-lambda) replaces the transition function with one that allows the empty string ε as a possible input, so that one has instead

T : Q × (Σ ∪{ε}) → P(Q).

It can be shown that ordinary NFA and NFA with epsilon moves are equivalent, in that, given either one, one can construct the other, which recognizes the same language.

?	The question mark indicates there is zero or one of the preceding element. For example, colou?r matches both "color" and "colour".
*	The asterisk indicates there are zero or more of the preceding element. For example, abc matches "ac", "abc", "abbc", "abbbc*", and so on.
+	The plus sign indicates that there is one or more of the preceding element. For example, ab+c matches "abc", "abbc", "abbbc", and so on, but not "ac". These constructions can be combined to form arbitrarily complex expressions, much like one can construct arithmetical expressions from numbers and the operations +, −, ×, and ÷. For example, H(ae?\|ä)ndel and H(a\|ae\|ä)ndel are both valid patterns which match the same strings as the earlier example, H(ä\|ae?)ndel. The precise syntax for regular expressions varies among tools and with context; more detail is given in the Syntax section.