Strings in JAVA


A few days ago I needed to extract all strings from .java files and also thought that it would be a good idea to keep count how many times a string is used. So I came up with this simple python script. It’s kind of a quick and dirty solution, but it met my needs for the particular task.

import sys, os, re
from operator import itemgetter

files = []
strings = {}
exp = re.compile("(\".+?\")")

def klist(bdir):
    dir = os.listdir(bdir)
    for fname in dir:
        if fname.endswith(".java"):
            files.append(bdir+"\\"+fname)
        if os.path.isdir(bdir+"\\"+fname):
            klist(bdir+"\\"+fname)

def get_strings(fname):
    fp = open(fname)
    data = fp.readlines()
    fp.close()
    print fname[fname.rfind("\\")+1:]+":"

for line in data:
        k = 1
        while(k
            m = exp.search(line, k)
            if m!=None:
                fstr = m.groups()[0]
                print "    "+fstr
                cnt = 1
                if strings.has_key(fstr):
                    cnt = strings[fstr] + 1
                strings.update({fstr : cnt})
                k = m.end()
            else:
                k = len(line)

if __name__ == "__main__":
    if len(sys.argv)<2:
        print "Usage: get_strings.py base_directory"
        exit(-1);

klist(sys.argv[1])
    for fname in files:
        get_strings(fname)

print "-"*70
    di = strings.items()
    di.sort(key=lambda x: x[1])
    for (k, v) in di:
        print v, ":", k

So what this basically does is gather the strings and prints out strings for each file and then after a separator line it prints some usage stats. This might contain bugs, because I was in a hurry to write it, so if you use do it at your own risk ;)