Profiling Python - Python

Profiling an application means to be able to sketch an image about what is going on behind the scenes when you execute a program.

The sys module is able to perform a very simple profiling task by telling you a little bit about what is going on after each function, method, or specific line gets executed.

sys.setprofiler(profiler_function)— This function implements a source code profiler, which identifies a function that must be executed whenever a function or method is called.

sys.settrace(tracer_function)— The functionality of this function is basically the same one of the setprofiler() function. However, this one is called whenever a new line is executed.

>>> import sys
>>> def profiler(frame, event, arguments):
... print frame.f_code.co_name, frame.f_lineno, event, arguments
...
>>> sys.setprofile(profiler)
? 1 return None
>>> lst = ["Spam","Parrot","Knights"]
? 1 call None
? 1 return None
>>> def showlist(_lst):
... for l in _lst:
... print l
... return _lst
...
? 1 call None
? 1 return None
>>> showlist(lst)
? 1 call None
showlist 1 call None
Spam
Parrot
Knights
showlist 4 return ['Spam','Parrot','Knights']
['Spam', 'Parrot', 'Knights']
? 1 return None
>>> sys.setprofile(None)

If you really want to perform a more complete and accurate study, you need to use the profiler module.

Python Profiler

The information provided here offers a brief overview about how to use the profile module to perform the analysis of the run time performance of a Python program. The original profile module waswritten by Sjoerd Mullender, and later Guido van Rossum applied some changes to it. All the original documentation is copyrighted by James Roskind (see copyright note in Appendix C, "Python Copyright Notices)" , and reproduced here with slight modifications.

Note: Check out the module's original documentation for more information about its "deterministic profiling" implementation.

You have two possible ways to use the profile module. The first option is to import it, and make it call a function on your program that you want to analyze, such as

import profile
def main():
for n in xrange(100):
print n,
profile.run("main()")

The run() function generates a profiling report that can be manipulated using the pstats module (the report generating functions are in the pstats module). The second option is to invoke the profiler as a main program and pass the script that needs to be profiled as an argument.

python profile.py scriptfile [arg...]

Next, you have the static member functions that are available for the profiler class. Note that an instance of Profile() is not needed to call them. To profile an application with a main entry point of foo(), you would add the following to your module:

import profile
profile.run("foo()")

The previous action would cause foo() to be run, and a series of informative lines (the profile) to be printed. This approach is most useful when working with the interpreter. If you would like to save the results of a profile into a file for later examination, you can supply a filename as the second argument to the run() function:

import profile
profile.run("foo()",'fooprof')

The primary entry point for the profiler is the global function profile.run(). It is typically used to create any profile information. The reports are formatted and printed using methods for the class pstats.Stats. The following is a description of all these standard entry points and functions. For a more in-depth view of some of the code, consider reading the later section on "Extensions: Deriving Better Profilers," which includes a discussion of how to derive better profilers from the classes presented, or reading the source code for these modules.

FUNCTION profile.run(string, filename_opt)

This function takes a single argument that can be passed to the exec statement, and an optional filename. In all cases, this routine attempts to exec its first argument, and gathers profiling statistics from the execution. If no filename is present, this function automatically prints a simple profiling report, sorted by the standard name string (file/line/function-name) that is presented in each line. The following is a typical output from such a call:

main()
2706 function calls (2004 primitive calls)
in 4.504 CPU seconds Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
2 0.006 0.003 0.953 0.477 pobject.py:75(save_objects)
43/3 0.533 0.012 0.749 0.250 pobject.py:99(evaluate)
...

The first line indicates that this profile was generated by the call: profile.run('main()'), and hence the executed string is 'main()'. The second line indicates that 2706 calls were monitored. Of those calls, 2004 were primitive. We define primitive to mean that the call was not induced via recursion.

The next line, Ordered by: standard name, indicates that the text string in the far right column was used to sort the output. The column headings include ncalls stands for the number of calls.

tottime stands for the total time spent in the given function (and excluding time made in calls to sub-functions). percall is the quotient of tottime divided by ncalls.

cumtime is the total time spent in this and all subfunctions (that is, from invocation till exit).figure is accurate even for recursive functions.

percall is the quotient of cumtime divided by primitive calls.

filename:lineno(function) provides the respective data of each function.

When two numbers are in the first column (for instance, 43/3), the latter is the number of primitive calls, and the former is the actual number of calls. Note that when the function does not recurse, these two values are the same, and only the single figure is printed.

Analyzing Profiles with the pstats Module

The pstats module analyzes the data collected by the Python profile module. The following example demonstrates how we can use this module to manipulate the information generated by the profile module:

>>> import profile, pstats
>>> def main():
... for n in xrange(3):
... print n
...
>>> p = profile.Profile()
>>> p.run("main()")
0
1
2
<profile.Profile instance at 7c2c20>
>>> s = pstats.Stats(p)
>>> s.sort_stats("time", "name").print_stats() 3 function calls in 58.727 CPU seconds
Ordered by: internal time, function name
ncalls tottime percall cumtime percall filename:lineno(function)
1 58.727 58.727 58.727 58.727 profile:0(main())
1 0.000 0.000 0.000 0.000 <stdin>:1(main)
1 0.000 0.000 0.000 0.000 <string>:1(?)
0 0.000 0.000 profile:0(profiler)
<pstats.Stats instance at 7c2280>
>>>

This module exposes the Stats(filename, ...) class. This class is used for creating reports from data generated by the Profile class. It imports data either by direct access to members of Profile class, or by reading in a dictionary that was emitted (viamarshal) from the Profile class. When you want to review the profile, you should use the methods in the pstats module. Typically you would load the statistics data as follows:

import pstats
p = pstats.Stats('fooprof')

The class Stats (the previous code just created an instance of this class) has a variety of methods for manipulating and printing the data that was just read into "p". When you ran profile.run(), the result of three method calls was printed:

p.strip_dirs().sort_stats(-1).print_stats()

The first method removed the extraneous path from all the module names. The second method sorted all the entries according to the standard module/line/name string that is printed (this is to comply with the semantics of the old profiler). The third method printed out all the statistics. You might try the following sort calls:

p.sort_stats('name')
p.print_stats()

The first call will actually sort the list by function name, and the second call will print out the statistics. The following are some interesting calls to experiment with:

p.sort_stats('cumulative').print_stats(10)

This sorts the profile by cumulative time in a function, and then only prints the ten most significant lines. If you want to understand what algorithms are taking time, the previous line is what you would use. If you were looking to see what functions were looping a lot, and taking a lot of time, you would do

p.sort_stats('time').print_stats(10)

This sorts according to time spent within each function, and then prints the statistics for the top ten functions.

You might also try
p.sort_stats('file').print_stats('__init__')

This will sort all the statistics by filename, and then print out statistics for only the class init methods (because they are spelled with __init__ in them). The sort_stats() method takes an arbitrary number of quoted strings to select the sort order. For example, sort_stats('time', 'name') sorts on the major key of "internal function time", and on the minor key of 'the name of the function'. As one final example, you could try:

p.sort_stats('time','cum').print_stats(.5,'init')

This line sorts stats with a primary key of time and a secondary key of cumulative time, and then prints out some of the statistics. To be specific, the list is first culled down to 50% (.5) of its original size, and then only lines containing "init" are maintained, and that sub-sub-list is printed.

Note: All the print methods take an argument that indicates how many lines to print. If the arg is a floating point number between 0 and 1.0, it is taken as a decimal percentage of the available lines to be printed (for example, .1 means print 10% of all available lines). If it is an integer, it is taken to mean the number of lines of data that you want to have printed.

If you wondered what functions called the previous functions, you could now (p is still sorted according to the last criteria) do

p.print_callers(.5,'init')

You would get a list of callers for each of the listed functions.

All methods from the Stats class return self, so you can string together commands such as

Stats('foo','goo').strip_dirs().sort_stats('calls').}
print_stats(5).print_callers(5)

This class constructor creates an instance of a statistics object from a filename (or set of filenames). Stats objects are manipulated by methods in order to print useful reports.

The file selected by the previous constructor must have been created by the corresponding version of profile. To be specific, there is no file compatibility guaranteed with future versions of this profiler, and there is no compatibility with files produced by other profilers (for example, the standard system profiler).

If several files are provided, all the statistics for identical functions will be coalesced so that an overall view of several processes can be considered in a single report. If additional files need to be combined with data in an existing Stats object, the add() method can be used. This can be used to average out the statistics for a short running program to increase the accuracy.

The following methods are exposed by the Stats class.

strip_dirs()— This method for the Stats class removes all leading path information from filenames. It is very useful in reducing the size of the printout to fit within (close to) 80 columns. This method modifies the object, and the striped information is lost.

After performing a strip operation, the object is considered to have its entries in a random order, as it was just after object initialization and loading. If strip_dirs() causes two function names to be indistinguishable (that is, they are on the same line of the same filename, and have the same function name), the statistics for these two entries are accumulated into a single entry.

add(filename, ...)— This method of the Stats class accumulates additional profiling information into the current profiling object. Its arguments should refer to filenames created by the corresponding version of profile.run(). Statistics for identically named (file, line, name) functions are automatically accumulated into single function statistics.

sort_stats(key, ...)— This method modifies the Stats object by sorting it according to the supplied criteria. The argument is typically a string identifying the basis of

a sort (for example:"time"or"name".)

When more than one key is provided, additional keys are used as secondary criteria when equality exists in all keys previously selected. For example, sort_stats('name', 'file') will sort all the entries according to their function name and resolve all ties (identical function names) by sorting by filename.

Abbreviations to Use as Sorting Keys

Abbreviations to Use as Sorting KeysAbbreviations to Use as Sorting Keys

Note that all sorts on statistics are in descending order (placing most time-consuming items first), whereas name, file, and line number searches are in ascending order (that is, alphabetical). The subtle distinction between "nfl" and "stdname" is that the standard name is a sort of the name as printed, which means that the embedded line numbers get compared in an odd way. For example, lines 3, 20, and 40 would (if the filenames were the same) appear in the string order "20", "3", and "40". In contrast, "nfl" does a numeric compare of the line numbers. In fact, sort_stats("nfl") is the same as

sort_stats("name","file","line").

reverse_order()— This method for the Stats class reverses the ordering of the basic list within the object. This method is provided primarily for compatibility with the standard profiler. Its utility is questionable now that ascending versus descending order is properly selected based on the sort key of choice.

print_stats(restriction, ...)— This method for the Stats class prints out a report as described in the profile.run() definition. The order of the printing is based on the last sort_stats() operation done on the object (subject to caveats in add() and strip_dirs()).

The arguments provided (if any) can be used to limit the list down to the significant entries. Initially, the list is taken to be the complete set of profiled functions. Each restriction is either an integer (to select a count of lines), or a decimal fraction between 0.0 and 1.0 inclusive (to select a percentage of lines), or a regular expression (to pattern match the standard name that is printed). If several restrictions are provided, they are applied sequentially.

For example

print_stats(.1, "foo:")

would first limit the printing to the first 10% of list, and then only print functions that were part of

filename ".*foo:".In contrast, the following command:
print_stats(".*foo:",.1)

would limit the list to all functions having filenames ".*foo:", and then proceed to only print the first 10% of them.

print_callers(restrictions, ...)— This method for the Stats class prints a list of all functions that called each function in the profiled database. The ordering is identical to that provided by print_stats(), and the definition of the restricting argument is also identical. For convenience, a number is shown in parentheses after each caller to show how many times this specific call was made. A second non-parenthesized number is the cumulative time spent in the function at the right.

print_callees(restrictions, ...)— This method for the Stats class prints a list of all functions that were called by the indicated function. Aside from this reversal of direction of calls (called versus was called by), the arguments and ordering are identical to the print_callers() method.

ignore()— This method of the Stats class is used to dispose of the value returned by earlier methods. All standard methods in this class return the instance that is being processed so that the commands can be strung together. For example pstats.Stats('foofile').strip_dirs().sort_stats('cum').

print_stats().ignore()

would perform all the indicated functions, but it would not return the final reference to the Stats instance.

Limitations

There are two fundamental limitations on this profiler. The first is that it relies on the Python interpreter to dispatch "call", "return", and "exception" events. Compiled C code does not get interpreted, and hence is invisible to the profiler. All time spent in C code (including built-in functions) will be charged to the Python function that has invoked the C code. If the C code calls out to some native Python code, those calls will be profiled properly.

The second limitation has to do with accuracy of timing information. There is a fundamental problem with deterministic profilers involving accuracy. The most obvious restriction is that the underlying clock is only ticking at a rate (typically) of about .001 seconds. Hence no measurements will be more accurate than that underlying clock. If enough measurements are taken, the error will tend to average out.

Unfortunately, removing this first error induces a second source of error. The second problem is that it "takes a while" from when an event is dispatched until the profiler's call to get the time actually gets the state of the clock. Similarly, there is a certain amount of lag when exiting the profiler event handler from the time that the clock's value was obtained (and then squirreled away), until the user's code is once again executing. As a result, functions that are called many times, or call many functions, will typically accumulate this error.

The error that accumulates in this fashion is typically less than the accuracy of the clock (that is, less than one clock tick), but it can accumulate and become very significant. This profiler provides a means of calibrating itself for a given platform so that this error can be probabilistically (that is, on the average) removed. After the profiler is calibrated, it will be more accurate (in at least a square sense), but it will sometimes produce negative numbers (when call counts are exceptionally low, and the gods of probability work against you). Do not be alarmed by negative numbers in the profile.

They should only appear if you have calibrated your profiler, and the results are actually better than without calibration.

Calibration

The profiler class has a hard-coded constant added to each event handling time to compensate for the overhead of calling the time function, and storing away the results. The following procedure can be used to obtain this constant for a given platform.

import profile
pr = profile.Profile()
pr.calibrate(100)
pr.calibrate(100)
pr.calibrate(100)

The argument to calibrate() is the number of times to try to do the sample calls to get the CPU times. If your computer is very fast, you might have to do:

pr.calibrate(1000)
or even
pr.calibrate(10000)

The object of this exercise is to get a fairly consistent result. When you have a consistent answer, you are ready to use that number in the source code. For a Sun Sparcstation 1000 running Solaris 2.3, the magical number is about .00053. If you have a choice, you are better off with a smaller constant, and your results will less often show up as negative in profile statistics.

The following shows how the trace_dispatch() method in the Profile class should be modified to install the calibration constant on a Sun Sparcstation 1000:

def trace_dispatch(self,frame,event,arg):
t = self.timer()
t =t[0] + t[1]- self.t -.00053 # Calibration constant
if self.dispatch[event](frame,t):
t = self.timer()
self.t = t[0]+t[1]
else:
r = self.timer()
self.t =r[0]+r[1]-t# put back unrecorded delta
return

Note that if there is no calibration constant, the line containing the calibration constant should simply say t = t[0] + t[1] - self.t # no calibration constant You can also achieve the same results using a derived class (and the profiler will actually run equally fast), but the previous method is the simplest to use. If the profiler was made self calibrating, it would have made the initialization of the profiler class slower, and would have required some very fancy coding, or else the use of a variable where the constant .00053 was placed in the code shown. This is a very critical performance section, and there is no reason to use a variable lookup at this point when a constant can be used.

Extensions: Deriving Better Profilers

The Profile class of profile was written so that derived classes could be developed to extend the profiler. The following two examples of derived classes can be used to do profiling. If the reader is an avid Python programmer, it should be possible to use these as a model and create similar (and perchance better) profile classes.

If all you want to do is change how the timer is called, or which timer function is used, the basic class has an option for that in the constructor for the class. Consider passing the name of a function to call into the constructor:

pr =profile.Profile(your_time_func)

The resulting profiler will call your time function instead of os.times(). The function should return either a single number, or a list of numbers (similar to what os.times() returns). If the function returns a single time number or the list of returned numbers has length 2, you will get an especially fast version of the dispatch routine.

Be warned that you should calibrate the profiler class for the timer function that you choose. For most machines, a timer that returns a lone integer value will provide the best results in terms of low overhead during profiling. (os.times is pretty bad because it returns a tuple of floating point values, so all arithmetic is floating point in the profiler.) If you want to substitute a better timer in the cleanest fashion, you should derive a class, and simply put in the replacement dispatch method that better handles your timer call, along with the appropriate calibration constant.

All rights reserved © 2020 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Python Topics