Logs parsing in Python

In my previous articles, I have written about the python files basic operation. However, there are more options available in Python to process the files.

Typically, any well-developed application should have associated log files for analysis, monitoring & audit purpose in order to track the application behavior. Usually, the log files are kept running whenever there is an event or trigger happened against the application. So if we want to monitor that kind of log file, we should have interactive file operation in place to read those logs.

Nowadays there are so many tools are available in the market for any kind of application monitoring but its limited to license also.  Say, there were requirements to monitor the particular error pattern in the large log file which kept updated whenever customer perform any kind of banking transactions. Though we had Splunk kind of monitoring tools in place, due to space constraints. It was raised to us to monitor that transaction log file and trigger if that error pattern identified in the log file. To achieve this scenario, we have used FILEINPUT Python module which interactively read the log file and raise email alert to our monitoring team,

How it can be interactive mode!  which means, it can read the running logs and instantly help us to parse the contents of that logs.

>>> import fileinput
>>> for i in fileinput.input():
... print "This is read intractive mode word:- "+i
... 
Apple   --Stdin input given
This is read intractive mode word:- Apple   --Given word displayed here along with print statement
Ball    --Stdin input given  
This is read intractive mode word:- Ball
cat     --Stdin input given
This is read intractive mode word:- cat

Below example would give you more detail on how we are benefited using FILEINPUT module.

>>> import glob -- glob is the module to perform *.* (patterns on file name) search.
>>> import fileinput --fileinput module has to be import in our program to use.
>>> logs = fileinput.input(glob.glob("Python*")) -- created the object for fileinput module and reading all files that start with Python.

Let’s explore sub-modules or objects that associated with this fileinput module.

>>> dir(logs)
['__del__', '__doc__', '__getitem__', '__init__', '__iter__', '__module__', '_backup', '_backupfilename', '_file', '_filelineno', '_filename', '_files', '_inplace', '_isstdin', '_mode', '_openhook', '_output', '_readline', '_savestdout', '_startlineno', 'close', 'filelineno', 'filename', 'fileno', 'isfirstline', 'isstdin', 'lineno', 'next', 'nextfile', 'readline']

I wanted to go back to the requirements of log file monitoring, to monitor the tomcat web server’s Catalina.out log and raise an alert if pattern “Out of Memory” error found in the logs.

So to achieve this requirement, fileinput module would be the best option,

>>> logs = fileinput.input(glob.glob('Catalina.log.*')) --reading the catalina logs
>>> logs.filename() --Unless we start the reading the file, filename module will not capture the filename of the file that is being read
>>> 
>>> logs.readline()  --Reading the first line of the file
'I am the first line from catalina.out logn'
>>> logs.filename() --Now i am getting the filename of the file that is currently active.
'Catalina.log.19092017_01'
>>> logs.filelineno() --To know the file line no, at which line the error pattern has found
1
>>> logs.isfirstline() --To perform any validation like whenever my monitor script read the first line of the file do this.
True  --produce the boolean value
>>> logs.nextfile()  --To start read the next 2nd hour file.
>>> logs.next() -- Read the next line from the file
'This is another first line in second file readn'
>>> logs.filename()
'Catalina.log.19092017_02'
>>> logs.lineno() --Tell me the line no on this whole operation.
2      --This is the second time reading the content from fileinput object
>>> logs.filelineno() --However this is the first line of the second file, hence the file line no indicate it is 1.
1
>>> logs.isstdin() --To validate whether the fileinput reading the running log file 
False

Using this sub-modules & objects, it is an absolutely easy operation to perform pattern searching, parsing on the running logs.

Advertisements

Files in Python

Most of the famous applications have been developed using Python, that main purpose is extensive file handling operation.

I had been assigned to create a Unix shell script which has to read the input files from different FTP location and parse that content of files with proper data structure format to pass it as input for DM express tool (It’s a business intelligence data integration tool).

With Unix Shell scripting, you can perform file read operation using control flow commands such as “For” & “While” , which is kind of I/O operation and suitable for small files  handling but the requirements is to read large files which will have Rx (tablets), doctor prescription information that’s been sold in each and every medical Shop of each location across the countries. You can imagine how big the file will be!

You can imagine how big the file will be!

So I was looking for a scripting language that uses professional file handling operation and found that the additional language of Perl & Python in the Unix server machines as pre-installed.

Perl has incurred powerful file operations that perform rapid I/O operations on file but my further requirements on parsing the file were not achievable with Perl.

So I chose Python and amazed the file I/O operations with contents parsing.
With file operations in Python, you can perform it on a single file, multiple files, file stream, zipped file & tar file directly.

The basic file operations in Python is,

  1. Open,
  2. Read
  3. Write
  4. Append
  5. Close

Let’s go directly with simple file read operation,

>>> fileobj = open("Python_train","r") -- File object.
>>> fileobj.read()  --Reading the file thorugh the file object.
'This is the first linensecond linenthird lineni think, this is enough and will write further whenever neededn'
>>> fileobj.read()
''  --File object can be referenced only one time.

In Python, everything is an object, so whatever you want to reference in Python can be handled as an object. As shown in the above example, first I have created the file object (fileobj) which referenced the opening the file in reading mode. In the next line, the file object has been referenced to access the contents of the files.

You should have noticed additional character “n” which is to move the printing position to the start of the next line and this will be very helpful when you parse the file.

Readline()
>>> fileobj = open("Python_train","r")
>>>
>>> fileobj.readline()   --File obj to read the content of file one by one.
'This is the first linen'
>>> fileobj.readline()
'second linen'
>>> fileobj.readline()
'third linen'
>>> fileobj.readline()
'i think, this is enough and will write further whenever neededn'
>>> fileobj.readline()
''

With the same open command, we can write in the file be using fileobj as shown below,

>>> fileobj = open("Python_train","w")
>>> 
>>> fileobj.write('This is first line written with fileobj')
>>> fileobj.write('This is second line written with fileobj')
>>> fileobj.write('This is third line written with fileobj')
>>> fileobj.write('This is final line enough written with filobj')

Now there is another method to readlines()

>>> fileobj = open("Python_train","r")
>>> fileobj.readlines() -- whole file read as single line in list format
['This is first line written with fileobjThis is second line written with fileobjThis is third line written with fileobjThis is final line enough written with filobj']
>>>

So what is the difference noticed here when using read() & readlines() method over the fileobj?  the whole file has been read as one line whereas the read method read the whole file with new line (“/n”) separator.

So the file object method can be used based on your requirements while we parse the files. Further, if you are in need of reading the file on specific character position, we can pass the parameter as character position to the read() as below,

>>> fileobj = open("Python_train","r")
>>> fileobj.read(10)  -- Read only first 10 character from file
'This is fi'
>>> a = fileobj.read(20) -- Read only first 20 character from file 
>>> print a
rst line written wit --Since the first 10 character in the file has been read, the next read start with remaining position.
>>> len(a)  -- Lets validate the lenth of the character that I read from file.
20
>>>

Important things to remember when we use file object. The write method always think that it is called to write the new file, so you need to be very careful on when to use write method, suppose if you want to keep write on the existing file then you can open the file in append mode as shown below,

>>> fileobj = open("Python_train","r")
>>> fileobj.read()
'Its writing me as new line in the Python_train file and existing data has been cleaned'
>>> 
>>> 
>>> fileobj = open("Python_train","a") --Append parameter on open method
>>> fileobj.write("I want to keep previous contents and wants to add more in the file")
>>> 
>>> fileobj = open("Python_train","r")
>>> 
>>> fileobj.read() -- The appended line does not written in new line rather it is updated in the same row.
'Its writing me as new line in the Python_train file and existing data has been cleanedI want to keep previous contents and wants to add more in the file'
>>> fileobj = open("Python_train","a")
>>> fileobj.write("/n Wants to write in second line") --Write in new line
>>> 
>>> fileobj.writelines("Another way to write in the file") --Writelines specifically useful if you want to write in file from any list
>>> 
>>> fileobj = open("Python_train","r")
>>> fileobj.read()
'Its writing me as new line in the Python_train file and existing data has been cleanedI want to keep previous contents and wants to add more in the file/n Wants to write in second lineAnother way to write in the file'
>>>

Also, file object has some useful module which definitely useful when we want to perform the file operation,

>>> dir(fileobj)
['__class__', '__delattr__', '__doc__', '__enter__', '__exit__', '__format__', '__getattribute__', '__hash__', '__init__', '__iter__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'close', 'closed', 'encoding', 'errors', 'fileno', 'flush', 'isatty', 'mode', 'name', 'newlines', 'next', 'read', 'readinto', 'readline', 'readlines', 'seek', 'softspace', 'tell', 'truncate', 'write', 'writelines', 'xreadlines']

close: This method has to be used to close the files that have been used in our program. Though if you have not called the close method to close the file, Python has the self-memory management which clears the objects that are idle.

>>> fileobj = open("Python_train","r")
>>> fileobj.read()
'Its writing me as new line in the Python_train file and existing data has been cleanednI want to keep previous contents and wants to add more in the file nWants to write in second linenAnother to write in the new linen'
>>> 
>>> fileobj.close()  --Close the file
>>> 
>>> fileobj   --Explicitly calling the file object to know the status.
<closed file 'Python_train', mode 'r' at 0x7fa6d995c5d0>

Closed: This object used for validating whether the file is in closed status or not, this is one of best practice when you are using in our code. This object returns the boolean as “True” for file closed and “False” for file not closed.

>>> fileobj.closed
True

Also, there are other options is which can be used for various validation during file operation such as,

>>> fileobj = open("Python_train","r")
>>> 
>>> fileobj.mode  -- Validate the mode that the file has been opened for.
'r'       --mode object return "r" which tells the file is in read mode.
>>> 
>>> fileobj.name   --To validate the name of the file currently in process
'Python_train'
>>> fileobj.fileno() --returns the file descritor. Descriptor refer the mode of file opened for kernal to handle. 
3
>>> fileobj.isatty()  --To check whether the file is being updated, in another terms file is streaming. Can be used when want to read the running logs.
False   --The file that I have opened in not 

Here in this article, I have covered the basic Python file objects & modules as much as possible. Please provide your comments if you have any queries to discuss more.

I will cover more Python file operation in upcoming articles on how to read the multiple files, reading the logs, etc..

 

Data Structure in Python

In any programming language, Data structure is the specialized format of handling and storing the data. The application that is been developed, should learn the structure of data for the processing.

If you are a computer science student, there would be a separate subject about Data Structure. It’s more of how the data has been handled in a structured format for processing, organizing and storing. There are types in the data structure to represent my data in my program.

Types of Data Structure:

  1. Array
  2. Boolean
  3. String
  4. Integer
  5. Character
  6. Float
  7. Double
  8. Enum
  9. Union
  10. Record
  11. Stack
  12. Queue
  13. Linked List
  14. set
  15. Associative Array
  16. Tree, etc..

Again these types are categorized based on the types of data such as primitives data, composite data & abstract data types.

Python has the special data structure such,

  • List
  • Tuple
  • set
  • dictionary

With all these data structures, let’s see how effective way we can handle the data to represent it in any format.

List:

The list is one of very useful data representation in sequence type to structure the data in Python. Using list we can organize the data in stack, queue & tree format. The list always initiated the index with “0”.

Below the example of how we can use the list

 --To initiatlize the list
>>> Emp1=[]    --- Any data represent inside the [] called list.
>>> type(Emp1)
<class 'list'>

The list has many methods to handle the data inside the list as below,

>>> dir(list)
['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__delslice__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getslice__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__setslice__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']

Append: It is a method to add value to the list. Append will add the data at the end of the list.

>>> Emp1.append('emp-a') #Adding employee name
>>> Emp1.append('1001') #employee number
>>> Emp1.append('01-01-2017') #employee data of birth
>>> Emp1.append(1000.00) #salary
>>>
>>>
>>> Emp1
['emp-a', '1001', '01-01-2017', 1000.0]

What if I want to insert the employee department just before the salary, we can use an insert method.

Insert: Method to use for insert the data at the specified index in the list as below.

>>> Emp1
['emp-a', '1001', '01-01-2017', 1000.0]
>>> Emp1.insert('Development')
Traceback (most recent call last):
 File "<pyshell#39>", line 1, in <module>
 list.insert('Development')
TypeError: insert() takes exactly 2 arguments (1 given) --- Insert method would expect two argument such index of list where the new data to be added.
>>>
>>> Emp1.insert(3,'Development') -- Insert dept name "development" as 3rd element in Emp1.
>>> Emp1
['emp-a', '1001', '01-01-2017', 'Development', 1000.0]

So here I have noticed one thing is that I could add only one data element to list. But to add all details of an employee in a single execution, we can use the extend method.

Extend: Method to add values to the list. Extend will behave same as Append but it can accommodate multiple values to add in one execution.

>>> Emp2 = []     -- Initiated another list Emp2
>>> Emp2.extend('emp-b')  -- Extend the list employee name 'emp-b'
>>> 
>>> Emp2
['e', 'm', 'p', '-', 'b'] -- Oops, It did not act like append. Extend consider input parameter as iteratable. Meaning, it will keep add the given data one by one.
>>> Emp2.append('emp-b')  -- Hope you have noticed the difference between Extend & Append.
>>> 
>>> Emp2
['e', 'm', 'p', '-', 'b', 'emp-b']

So far we have seen how to add the data in the list, now we will play around with data using method Remove, Pop, Del, Index, Count, Sort & Reverse.

>>> Emp1 = ['emp-a','1001','01-01-2017',1000.00]
>>> Emp1.remove(1000.00) --- Wants to remove salary part from this list
>>> Emp1
['emp-a', '1001', '01-01-2017']

#If I wanted to process the Emp1 list and make the list to be empty in order to reuse, then we can use pop method.
>>> Emp1
['emp-a', '1001', '01-01-2017']
>>> Emp1.pop(2) #I wanted to validate DOB of employee and same time clear that from list.
'01-01-2017'

#So before I pop out the items from the list, how can I validate the index of that value.
>>> Emp1.index('1001')  -- Wants to check the index of the employee id in the list.
1
>>> #I have created new list with all employee names
>>> Emp_name = ['a', 'b', 'c', 'd', 'a']
>>> Emp_name
['a', 'b', 'c', 'd', 'a']
>>> #I wanted to check, how many employee has the name "a"
>>> Emp_name.count('a') --Count is the method to count the number of times the data present in the list.
2
>>> #I wanted to order the employee list in ascending.
>>> Emp_name.sort()  -- Sort is the method to sort the data inside the list.
>>> Emp_name
['a', 'a', 'b', 'c', 'd']   -- Data in this list has been sorted albhabetical order. 
>>> #I want to order the employee list in desending.
>>> Emp_name.sort(reverse=True)  -- Sort has the parameter to constomize the elements.
>>> 
>>> Emp_name
['d', 'c', 'b', 'a', 'a']
>>> #List also has the straight forward method to reverse the list.
>>> Emp_name.reverse()
>>> Emp_name
['a', 'a', 'b', 'c', 'd']
>>> #Del has the option to delete the any item from list using the index. But the way of using would slightly vary.
>>> del Emp_name[4] -- Pop method throws the output after deleting it from list but Del wont do that.
>>> 
>>> Emp_name
['d', 'c', 'b', 'a']

The list is very volatile in nature, hence it has all the associated methods to play around the data. You can also create array, stack & queue using list and list can be nested also below,

#List is flexible to add with one or more.
>>> Emp1 = ['A',1,'Class-A']
>>> Emp2 = ['B',2,'Class-B']
>>> Emp3 = ['C',3,'Class-C']
>>> Comp = Emp1 + Emp2 + Emp3 -- Combining the list.
>>> Comp
['A', 1, 'Class-A', 'B', 2, 'Class-B', 'C', 3, 'Class-C']
#Nested List. In another mean, Using list we can create "n" dimentional array.
>>> Comp1 = [Emp1,Emp2,Emp3]  -- Creating the Nested List
>>> Comp1
[['A', 1, 'Class-A'], ['B', 2, 'Class-B'], ['C', 3, 'Class-C']]

Also, inside the list, we can perform any kind of mathematical action to arrive the Complete List.

>>> [x**2 for x in range(10)]   -- Perform for & if loop inside the list.
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Added with that, there are the tool in python to perform functional programming on List as below,

>>> #Map function to perform mapper (perform any kind of validation on the given list)
>>> Emp_name
['d', 'c', 'b', 'a']
>>> map(lambda x: x=='a',Emp_name)   -- I wanted to check whether employee 'a' is in the Emp_name list
[False, False, False, True]   -- Return TRUE since the value 'a' present in the last list.

>>> #Filter funtion to perform filter on the list.
>>> filter(lambda x: x=='A', Comp) -- I wanted to get the employee name.
['A']
>>> # Reduce function to perform reduce (combine) operation on list.
>>> Plus_word = ['Welcome ','to ','LEARNINONE.COM']
>>> reduce(lambda x,y:x+y, Plus_word) --reduce the list to one.
'Welcome to LEARNINONE.COM' 

Reduce functionality available under “functions” module on Python version more than 3.

Tuples:

Tuples is another sequence type of data structure in Python. Though tuple is sequence type as like List, it is immutable to play around the data elements inside the tuples. So you need to choose tuple data structure when you want to keep your data be constant throughout in the code.

>>> dept_name = 'Computer','Mechanicle','Electrical'
>>> dept_name
('Computer', 'Mechanicle', 'Electrical')
>>> # When you assign multiple values to variable, python will automatically consider that as tuple.
>>> dir(tuple)
['__add__', '__class__', '__contains__', '__delattr__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__getslice__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'count', 'index']
>>> 
>>> dept_name.index('Computer')
0
>>> dept_name.index('Mechanicle')
1
>>> dept_name.count('Electrical')
1

Set:

Set is another sequence type data structure used in Python. Same as tuples, to create the set, we have to use brackets. Unlike tuple & list, set perform the distinct operation on the data elements and remove duplicates.

Note: Python version above 3, uses the curly brackets for sets.

Below example would give you more explanation,

>>> Emp_name = ['a','a','b','c','d']
>>> Emp_name
['a', 'a', 'b', 'c', 'd']
>>> set(Emp_name)  --Removed the duplicates from list Emp_name.
set(['a', 'c', 'b', 'd'])

As like, we can also perform comprehension inside the set.

>>> set(x**2 for x in range(9)) --Finding the power for numbers 1 to 9
set([0, 1, 4, 64, 9, 16, 49, 25, 36])

Dictionary:

Another wonderful data structure is in place for Python programmer is Dictionary. This is to fulfil the associative array data structure type. Dictionary can handle the data that comes with key value pair format as below,

>>> Emp_dept_link={}  --Initializing the dictionary 
>>>
>>> Emp_dept_link
{}    -- Dictionary created with empty key value pair.
>>> dir(Emp_dept_link)   --Modules available to play with key value paid data
['__class__', '__cmp__', '__contains__', '__delattr__', '__delitem__', '__doc__', '__eq__', '__format__', '__ge__', '__g
etattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__ne__', '_
_new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclassho
ok__', 'clear', 'copy', 'fromkeys', 'get', 'has_key', 'items', 'iteritems', 'iterkeys', 'itervalues', 'keys', 'pop', 'po
pitem', 'setdefault', 'update', 'values', 'viewitems', 'viewkeys', 'viewvalues'] 
>>> Emp_dept_link['a']='Computer Science' -- adding new employee 'a' with dept name 'Computer Science'.
>>>
>>> Emp_dept_link
{'a': 'Computer Science'}
>>> Emp_dept_link['b']='Mechanicle' -- adding new employee 'b' with dept name 'Mechanicle'.
>>> Emp_dept_link
{'a': 'Computer Science', 'b': 'Mechanicle'}

with keyword Dict() we can create a dictionary on the sequence of the data type in the form of the key-value pair as below,

>>> a=[('a','Computer Science'),('b','Mechanicle'),('c','Electrical'),('d','Electronics')]
>>>
>>> a
[('a', 'Computer Science'), ('b', 'Mechanicle'), ('c', 'Electrical'), ('d', 'Electronics')]
>>>
>>> dict(a)
{'a': 'Computer Science', 'c': 'Electrical', 'b': 'Mechanicle', 'd': 'Electronics'}
>>> dict(a=1001, b=1002, c=1003,d=1004)  -- Assign the employee id value to employee name.
{'a': 1001, 'c': 1003, 'b': 1002, 'd': 1004}

As like List & set, Dict(,) comprehension can be used to create the dictionary from arbitrary key & value expression as below.

>>> {x: x**2 for x in range(9)}  
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64}

We will further deep look into the upcoming articles about List, Tuple, Set & Dictionary

What is Python

[amazon_link asins=’1449355730′ template=’ProductAd’ store=’learninone1-20′ marketplace=’US’ link_id=’36aa0b88-b7ca-11e7-88e9-175644837b64′]

There was a requirement for me to recruit a resource with scripting knowledge, so I raised the resource requirements concern to my manager. Unfortunately, at the same time,  there was another requirement on our co-team operation to find the resource with skills of domain knowledge. Since two teams required two resources and budget reason, our manager asked us to find the resource who should have both domain knowledge as well as scripting knowledge.

So now it’s on me to prepare the JD about this role,

  1. The resource should be a very dynamic person to understand without declaring anything affront.
  2. Quick decision maker whenever the issues flow.
  3. Well understanding the data to prepare the structured reports.
  4. Good to have incurred knowledge of general coding standards, networking, operating system, etc.
  5. Identify the error, handle the error and report the error in a meaningful way.
  6. Since the recourse is shared for support & operation team, should have the OOPs characteristic of inheriting himself to other team members such as conducting training, KT session and should not be dependency creator.

May be! we could find the resource with this expectation but beyond that, we also need that resource to be easily understandable, more interactive interpreter, productivity, versatility on start & end time and last but not least that the resource should quickly compile the internal process and provide an outcome.

After so many iterations of the interview process, finally, we found very good resource named Python.

Being in production support, achieving internal ops oriented automation with limited option of “shell script” , “batch script” or “SQL” is not always up-to our mark but luckily we found Python who has incurred many characteristics using which we can achieve any kind of peripheral automation performed without relying on development team. or environment.

Most of the guys who start their career in production support, would first learn the shell scripting and SQL (structured query language).

Almost all Unix/Linux servers preinstalled with Python  (version up-to 2.7) and ready to use with interactive interpreter environment.

I started this article with real time scenario so that it would easy for the initiator to understand more about Python programming language and how effective way it can be used.

Python is powerful programming language which is very easy to learn and fun to code, as it is not very hard to code with curly brackets or keywords which we need to always in mind and keep track of brackets as where it is opened and where it is closed.  The coding is easy with simple indentation to delimit the code block as below

>>> if 1 == 1:⇒ colon to start the block
... ←ind→print("Here it is indentation, it's nothing but one tab space")
...
Here it is indentation, it's nothing but one tab space

The speciality in Python is, it encompasses the approach of object oriented programming (OOPS) so that I can write a class which can be referenced by object, code reusable, hiding the data and inherit the class to child class.

python has the capabilities to understand data types such as the number, string, float, etc and do not required to declare anywhere in the code but you need to define before referring it.

>>> a=10
>>> type(a)
<type 'int'>    #recognize the data as integer.
>>> b='Vinoth'
>>> type(b)
<type 'str'>    #recognize the given data as string since the data assigned enclosed in quotes. 
>>> c=12.22
>>> type(c)
<type 'float'>  #recognize the data as float.
>>> d='python','shell'
>>> type(d)
<type 'tuple'>  #recognize as tuple compound data type. Will discuss about this more on upcoming articles.
>>> e = ["Python",3]
>>> type(e)
<type 'list'>   #recognize as list compound data type. Will discuss about this more on upcoming articles.

# – comments in code

May be if you are good in using vocabulary (syntax), you can finish your sentence in one line of code. Python is dynamically typed which means the compiler and execution happened at run-time so it’s not required to compile the code separately.

Memory management is automatically handled in Python, memory management is a vital part when we code in any other programming college like malloc & free in “C” and garbage collection in “Java”. Due to this memory management handling, some developers got the name of “Poor programmer”.

Memory management is the task which has to be allocated when new object created and same has to be released after that object used, to pass on to another object for reuse.

Like other programming languages, python also has the control flow tools such as,

IF : To make decision and flow the control accordingly.

>>> iam="Python"
>>> if iam == 'Python': print("Easy to learn")
Easy to learn
>>> #Now i will change the values of "iam" to "Shell" and flow the changes
>>> iam="shell"
>>> if iam == 'Python':>>> if iam == 'Python': print "Easy to learn"else: print "I already know"

FOR: It is shell script like for loop, so we can iterate over the sequence and flow your code as below

>>> iam=["Python","Shell"]

>>> for lan in iam: #Its easy to iterate the sequence and flow the code accordingly.
        if lan == "Python": 
                print lan+" is Easy to Learn" 
        else: 
                print lan+" is already known"
Python is Easy to Learn
Shell is already known

We have also other control statement such as break, continue & pass which we will discuss whenever it is required in the coding practices.

Feed up more on this one article would bore you. So I will stop here and let you explore more about it. You can also comment your doubt ,queries and logics.

Python-Subprocess

Before I start the topic, I would like to start with the real-time scenario…

I have gone throw challenges on managing the people who are called by me when I planned to do alteration at my home.

  1. Sometimes I don’t know whether they are working,
  2. How they are working,
  3. what is the progress on work,
  4. how many hours they are taking to complete the work,
  5. what they want,
  6. why they are not working,
  7. What are they expect to finish the work?

But now it is very easy to manage any kind of people that I called. Mr.Python suggest me a person who’s named is Subprocess. Subprocess is the very decent person handling the people (process) that I wanted to call. I simply instruct subprocess with simple in & out variables.

Now let’s know more about subprocess:

Subprocess is specifically used when there are requirements to call external OS specific commands, external program or any executable program either it is parameterized or user input required to pass on the console for that program.

The subprocess module allows you to spawn new processes, connect to their input/output/error pipes, and obtain their return codes.

ls -lrt   -- a simple unix well known commands

Now there are requirements for me to call this command when I do coding in python. There are many ways to call this command but I want to know the best way of calling this command so that I can pass the input that it may be required, fetch the output that it may produce or caught the error that may occur at once spawn it from my coding. Yes, the best way of calling this command is subprocess.

Why?? Below example would give you light

Vinoth.Durairaj@local:~$ ls -lrt
total 16
-rw-r--r-- 1 Vinoth.Durairaj grpAppAdmins  242 May 21 17:32 local1.py
-rw-r--r-- 1 Vinoth.Durairaj grpAppAdmins 1491 May 21 17:42 local2.sh
-rw-r--r-- 1 Vinoth.Durairaj grpAppAdmins 1809 Jul 10 15:43 local3.sql
-rw-r--r-- 1 Vinoth.Durairaj grpAppAdmins 2663 Jul 16 16:58 local4.sh
Vinoth.Durairaj@wp1rngs02t:~$ echo $? -- previous command exist status
0    --- It gives exist status "0" because previosu command correct.

let me pass the invalid option to get the exit status.

Vinoth.Durairaj@local:~$ ls -z --- passing invalid option to check the exit status
total 16
ls: invalid option -- 'z'
Try `ls --help' for more information.
Vinoth.Durairaj@wp1rngs02t:~$ echo $? -- previous command exist status
2     --- It gives exist status "2" because previosu command failed.

Now let me execute this command from Python as per requirements but this time i am going to use most known way of os.system.

>>> import os
>>> os.system('ls -lrt')
total 16
-rw-r--r-- 1 Vinoth.Durairaj grpAppAdmins  242 May 21 17:32 local1.py
-rw-r--r-- 1 Vinoth.Durairaj grpAppAdmins 1491 May 21 17:42 local2.py
-rw-r--r-- 1 Vinoth.Durairaj grpAppAdmins 1809 Jul 10 15:43 local3.py
-rw-r--r-- 1 Vinoth.Durairaj grpAppAdmins 2663 Jul 16 16:58 local4.py
0   ---- its executed and notify that the command output also 0.

Let’s pass the same invalid option to the command calling from os.system to know what is the return code it produces to my python code.

>>> os.system('ls -z')    --- Now i am trying to give invalid option to check how truthful the os.command to me.
ls: invalid option -- 'z'
Try `ls --help' for more information.
512     

Oops, what is that 512? this is not the exit status that I am expecting and it is not matching the return code received when I directly executed. |So this is not the best option to use in my python coding.

Ok. So as I said, subprocess is the best way of calling this command. HOW? let’s see!!

>>> import subprocess
>>> subprocess.call(["ls","-lrth"])
total 16
-rw-r--r-- 1 Vinoth.Durairaj grpAppAdmins  242 May 21 17:32 local1.py
-rw-r--r-- 1 Vinoth.Durairaj grpAppAdmins 1491 May 21 17:42 local2.py
-rw-r--r-- 1 Vinoth.Durairaj grpAppAdmins 1809 Jul 10 15:43 local3.py
-rw-r--r-- 1 Vinoth.Durairaj grpAppAdmins 2663 Jul 16 16:58 local4.py
0   ---- its executed and notify that the command output also 0.

no changes in the output when we compare it with os.system.

>>> subprocess.call(["ls","-z"])    --- Now i am trying to give invalid option to check how truthful the os.command to me.
ls: invalid option -- 'z'
Try `ls --help' for more information.
2   

ahh, this is what I expect in my coding. So subprocess is the right option.

I learned that subprocess is the right module to achieve my requirements, so let’s explore more about it, to do so we need to use “dir” to know any module in python as below.

>>> dir(subprocess)
['CalledProcessError', 'MAXFD', 'PIPE', 'Popen', 'STDOUT', 'TimeoutExpired', '_PIPE_BUF', '__all__', '__builtins__', '__doc__', '__file__', '__name__', '__package__', '_active', '_cleanup', '_demo_posix', '_demo_windows', '_eintr_retry_call', '_has_poll', 'call', 'check_call', 'errno', 'fcntl', 'gc', 'list2cmdline', 'mswindows', 'os', 'pickle', 'select', 'signal', 'sys', 'time', 'traceback', 'types']

we will go through only callable objects.

subprocess.call : As in the example, any process/program/commands can be executed using this method but what if an error occurs, that I wanted to catch as an exception in my code, fine we can use another method called check_call.

subprocess.check_call : I can use this if I directly wanted to jump to an exception. what if I wanted to fetch the output from the command? yes, we can use another method check_output.

>>> subprocess.check_call(["ls","-z"])
ls: invalid option -- 'z'
Try `ls --help' for more information.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.6/subprocess.py", line 505, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['ls', '-z']' returned non-zero exit status 2

subprocess.check_output : I can use this if I am expecting output or want to pass input to the process/program that I called from my code. If the called process gives output then it will provide the output and if the called process throws an error then it will flow my code to an exception when caught this kind of exception using CalledProcessError.

The next frequently used the method in subprocess is Popen. Popen is very supportive in subprocess for the developer who uses this module and wants to play around with external process signals.

subprocess.Popen(args, bufsize=0, executable=None, stdin=None, stdout=None, stderr=None, preexec_fn=None, close_fds=False, shell=False, cwd=None, env=None, universal_newlines=False, startupinfo=None, creationflags=0)

with this Popen method, I can easily communicate, kill, poll, terminate, send_signal and wait on the process/program that I have called from my python code.

ok. let me try the Popen with same ‘ls’ command.

>>> subprocess.Popen(["ls"])
local1.py local2.py local3.py local4.py 

the output is as expected. but how can I communicate with the process/program?

>>> ret = subprocess.Popen(["ls"])
>>> local1.py local2.py local3.py local4.py
>>> ret.communicate()
(None, None)

communicate is the callable module of subprocess.Popen, which always produce the output in tuple data structure. But in the above output was only (None, None). why?? to make any communication to the external process, we need to first construct the pipe with three phase such as stdin, stdout & stderr.

Ok, let’s see how to make PIPE connection.

>>> ret = subprocess.Popen(["ls","-lrt"], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
>>> ret.communicate()
(b'total 16 -rw-r--r-- 1 Vinoth.Durairaj grpAppAdmins  242 May 21 17:32 local1.py -rw-r--r-- 1 Vinoth.Durairaj grpAppAdmins 1491 May 21 17:42 local2.py -rw-r--r-- 1 Vinoth.Durairaj grpAppAdmins 1809 Jul 10 15:43 local3.py -rw-r--r-- 1 Vinoth.Durairaj grpAppAdmins 2663 Jul 16 16:58 local4.py', b'')

So here I have made PIPE connection which is an instance of subprocess. Now when I call the communicate, it gives me output in tuple format, where the first set is the output of the command and second set is an error which is blank now. How it’s working?

stdin: to communicate with your process by passing input such parameter, value and user input.

stdout: to communicate with your process until end-of-file / terminate process.

stderr: to communicate with your process until end-of-file / terminate process.

eg: I have the program that expects user to input “1” to say “Hi”, “2” to say “Hello” and “3” to exit. Now I can easily call that program from my python subprocess.Popen to communicate with “1” “2” & “3”.

ret=subprocess.Popen([hifi_say.pl],stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=False)
                        ret.stdin.write('n')
                        ret.stdin.write('n 1')
                        ret.stdin.write('n 7')
out,err = ret.communicate()
print out
Hi

In the above example, I have passed “1” to get “Hi” and “7” to exit that process. This is the simple example of how we can communicate with process/program or commands that we are calling from our python code. Hope this would have given some idea about subprocess. I leave to you rest to explore more about subprocess to play with it in your any kind of development requirements.