I l@ve RuBoard |
7.6 Calculating Apache Hits per IP AddressCredit: Mark Nenadov 7.6.1 ProblemYou need to examine a log file from Apache to know the number of hits recorded from each individual IP address that accessed it. 7.6.2 SolutionMany of the chores of administering a web server have to do with analyzing Apache logs, which Python makes easy: def CalculateApacheIpHits(logfile_pathname):
# Make a dictionary to store IP addresses and their hit counts
# and read the contents of the log file line by line
IpHitListing = {}
Contents = open(logfile_pathname, "r").xreadlines( )
# You can use .readlines in old Python, but if the log is huge...
# Go through each line of the logfile
for line in Contents:
# Split the string to isolate the IP address
Ip = line.split(" ")[0]
# Ensure length of the IP address is proper (see discussion)
if 6 < len(Ip) <= 15:
# Increase by 1 if IP exists; else set hit count = 1
IpHitListing[Ip] = IpHitListing.get(Ip, 0) + 1
return IpHitListing
7.6.3 DiscussionThis recipe shows a function that returns a dictionary containing the hit counts for each individual IP address that has accessed your Apache web server, as recorded in an Apache log file. For example, a typical use would be: HitsDictionary = CalculateApacheIpHits("/usr/local/nusphere/apache/logs/access_log") print HitsDictionary["127.0.0.1"] This function is quite useful for many things. For example, I often use it in my code to determine the number of hits that are actually originating from locations other than my local host. This function is also used to chart which IP addresses are most actively viewing pages that are served by a particular installation of Apache. This function performs a modest validation of each IP address, which is really just a length check:
The purpose of this check is not to enforce any stringent validation (for that, we could use a regular expression), but rather to reduce, at extremely low runtime cost, the probability of data that is obviously garbage getting into the dictionary. As a general technique, performing low-cost, highly approximate sanity checks for data that is expected to be okay (but one never knows for sure) is worth considering. 7.6.4 See AlsoThe Apache web server is available and documented at http://httpd.apache.org. |
I l@ve RuBoard |