Simple File Comparison

Nov 03, 2009 November Post-A-Day Python

Without going into too many specifics, one of the things I do on a weekly basis is intake records from one of our vending partners, massage the data a little, and update one of our own tables based on the results. Basically, what I get from this vendor each week is a list of codes with some additional information attached to each record.

The reason this painfully simple script works for me is that the list of codes doesn't change from week to week - each new file has an identical number of records. What changes is the additional data - sales numbers associated with each code. Your mileage may vary.


#!/usr/bin/python

# usage: ./compare.py pwdata102409.txt pwdata103109.txt

import sys, datetime

# pass in the names of the two files to be compared
file1 = sys.argv[1]
file2 = sys.argv[2]

# open both files
f1 = open(file1, "r")
f2 = open(file2, "r")

# read both files
fileOne = f1.readlines()
fileTwo = f2.readlines()

# then close them ...
f1.close()
f2.close()

# open output files and give them date-specific names 
todaysdate = datetime.datetime.now().strftime("%Y%m%d")
outFile1 = open(todaysdate + "_results.txt", "w")
outFile2 = open(todaysdate + "_updates.txt", "w")

# loop over the contents of the first file, find differences in the second file, and write those to the first output
x = 0
for i in fileOne:
    if i != fileTwo[x]:
        outFile1.write(i.rstrip() + " != " + fileTwo[x].rstrip() + "\n")

        # this second output is just so that I can eyeball the updates myself before I run them 
        # against a master table that keeps the most current records of all the code activity
        code = fileTwo[x].split("|")[0]
        codes_used = fileTwo[x].split("|")[3]
        code_limit = fileTwo[x].split("|")[4].rstrip()
        outFile2.write("UPDATE used_codes SET codes_used='" + codes_used + "', code_limit='" + code_limit + "' WHERE passcode = '" + code + "';" + "\n")
    x += 1

print "number of records changed: " + str(x)

# ... aaaand then close the output files
outFile1.close()
outFile2.close()


One thing I usually do but haven't here is move the generated files to an archive location - for now, I'm the only person who needs access to these files, so I don't bother moving them out of the folder where the script runs. But if I were to do that, it would look a little something like this:


processed = "/some/filepath/"
shutil.move(outputfile, processed)