Monday, February 4, 2013

VKEDCO: Vladimir Kulyukin's Education Coop: Python & Perl ...


There is a one-to-one correspondence between finite state automata (FSA) and regular expressions in the sense that every regular expression can be compiled into an FSA and for every FSA there is an equivalent regular expression. Equivalence in this context is construed as the equivalence of languages. In other words, an FSA and a regular expression are equivalent if and only if they accept/recognize the same language.

Suppose we want to implement a finite state machine (FSM) and use it in pattern matching. The abbreviations FSA and FSM are interchangeable. The most important aspect of an FSM is its transition table. Consider an FSM in Figure 1.


This FSM has two states {1, 2}. The start state is 1 and the end state is 2. The language accepted by this FSM is {a}. To put it differently, this automaton accepts only one string that consists of the symbol a.
?
We can represent the transition table of this FSA with a Python dictionary or a Perl hash.

tran_tbl_01 = {}
tran_tbl_01['a'] = {1 : [2]}

?
The above code fragment represents the FSM's transition table as a dictionary of dictionaries. The first dictionary takes a symbol, e.g., 'a', and maps it to another dictionary that maps states to lists of states. In other words, when reading 'a', in state 1, the FSM can transition to any state in the list [2]. In this case, this list contains only 1 state, but it can have multiple states or be empty.

In Perl, we can realize the same ideas as follows:

my %tran_tbl_01_a = (1, [2]);
my %tran_tbl_01 = ('a', \%tran_tbl_01_a);





We first obtain a hash (%hash_tbl_01) that maps 1 to [2] and then place its reference into another hash (%tran_tbl_01)?
under the key 'a'.?

Once we have an FSA's transition table, we need to access its elements. Here is a way to do it in Python.

def tran_table_lookup(sym, state, tran_tbl):
??? if tran_tbl.has_key(sym):
??????? return tran_tbl.get(sym, []).get(state, [])
??? else:
??????? return []

def tran_table_epsilon_lookup(state, tran_tbl):
??? return tran_table_lookup('', state, tran_tbl)

Note that we encode the epsilon as ''. Recall that epsilon transitions allow the FSA to transition from its current state to another state without consuming any input.

Here is how the same access functionality can be implemented in Perl:

sub tran_table_lookup {
? my ($sym, $state, $tran_tbl) = @_;

? ## check if $sym exists in $tran_tbl.
? if ( exists($tran_tbl->{$sym}) ) {

??? ## if it does, get the hash reference that maps?
? ? ## individual states to lists of states
??? my $state_to_states = $tran_tbl->{$sym};

??? ## check if the current state $state exists as a key
??? if ( exists($state_to_states->{$state}) ) {

????? ## if it does, return the reference to the corresponding list of states
????? return $state_to_states->{$state};
??? }
??? else {

????? ## return an empty list reference
????? my @empty_ary = ();
????? return \@empty_ary;
??? }
? }
}

sub tran_table_epsilon_lookup {
? my ($state, $tran_tbl) = @_;
? return tran_table_lookup('', $state, $tran_tbl);
}
?
?
An FSA can be represented as a 3-tuple of a start state, a list of final states, an transition table. Here is a Python realization of this representational choice:

fsa_01 = (1, [2], tran_tbl_01)

def get_start_state(fsa): return fsa[0]
def get_fin_states(fsa): return fsa[1]
def get_tran_table(fsa): return fsa[2]

Perl's implementation is similar:

my @fsa_01 = (1, [3], \%tran_tbl_01);

sub get_start_state {
?return $_[0];
}

sub get_fin_states {
? return $_[1];
}

sub get_tran_table {
? return $_[2];
}?

Let i be the current position in some text txt, n - the length of txt, cur_state is the current state of the FSA, fin_states is the FSA's final states, and tran_tbl is the FSA's transition table. Then, given an FSA, we can use the following method to see if txt is accepted by the FSA.

match_fsa(txt, i, n, cur_state, fin_states, tran_tbl):
? ? if ( i == n ):
? ? ? ? if ( cur_state is in fin_states ):
? ? ? ? ? ? return true
? ? ? ? else:
? ? ? ? ? ? next_epsilon_states = states the FSA can get to on epsilon from cur_state
? ? ? ? ? ? for nes in next_epsilon_states:
? ? ? ? ? ? ? ? if nes is in fin_states:
? ? ? ? ? ? ? ? ?? return true
? ? ? ? ? ? return false
? ? else:
? ? ? ?? next_states = states the FSA can get to from cur_state on txt[i]
? ? ? ?? next_epsilon_states = states the FSA can get to from cur_state on epsilon
? ? ? ?? if (next_states and next_epsilon_states are both empty):
? ? ? ? ? ?? return false
? ? ? ?? else:
? ? ? ? ? ?? for ns in next_states:
? ? ? ? ? ? ? ?? rslt = match_fsa(txt, i+1, n, ns, fin_states, tran_tbl)
? ? ? ? ? ? ? ?? if ( rslt is true ): return true
? ? ? ? ? ?? for nes in next_epsilon_states:
? ? ? ? ? ? ? ?? rslt = match_fsa(txt, i, n, nes, fin_states, tran_tbl)
? ? ? ? ? ? ? ?? if ( rslt is true): return true
? ? ? ? ? ?? return false
?? ? ? ? ? ? ? ? ? ?? ?
What To Implement
1. Implement match_fsa in Python & Perl.?

2. Build two FSA's that accept languages {(ab)^n | n >= 1} and {a^n | n is even} U {b^n | n is odd}. One FSA for each language.
?

3. Construct two regular expressions in Python and Perl for the same languages.?

4. Test both FSAs and your regular expressions on the following set of strings: '', 'ab', 'abab', 'ababab', 'abbb', 'aaaa', 'aaa', 'aaaaaa', 'b', 'bbb', 'bbbbb', 'abbaabba'.?

5. Do you notice any difference between your implementation of match_fsa and the way the native regex engines do the matching? Briefly (no more than 3 sentences) explain what is the difference, if there is any.?

What & Where To Submit
1. Create a subfolder hw_04 in your Dropbox folder and submit two files there: fsa.py and fsa.pl.

2. The files should contain your implementations of match_fsa, your regular expressions, and your answer to question 5.

Source: http://vkedco.blogspot.com/2013/02/python-perl-matching-text-patterns-with.html

taxes game of thrones season 2 trailer sag award winners girl scout cookies screen actors guild royal rumble results sag awards 2012

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.