Keyword Counter – My First Python Project
Why I built a keyword counter
As a digital marketer, SEO, search engine optimization, is something I do with some frequency. SEO is a strategy that marketers use to improve the “findability” of content by search engines. There are a lot of factors that go into optimizing content for SEO, but one of the easier factors to influence is the placement of keywords, words used to find content in search engines, on the page.
For one project, I wanted to figure out how our client’s competitors were optimizing their web pages for search engines. More specifically, I needed to know how many times a particular keyword appeared on a page and in which html tags those keywords were located. Although I ended up doing this manually, it was a long, tedious task. I realized that I could use my Python skills to do this analysis more quickly and accurately. I decided to make this my very first independent Python project, just in case I ever wanted to do this task again.
Building the project
I created a program that allows the user to input a keyword and a url. The program then prints out the text of the document, and returns a csv file with the number of times the keyword appears in the document in specific html tags (title, h1-h6, p, a href, img, and meta tags), keyword count excluding a href counts, the total keyword count, and the url. The user can input another url to analyze, if desired, and those results are added to the original csv file. To exit the program, the user enters an empty string.
To get started, I used sample code from a Python class I had taken earlier and expanded on it. The tricky part about this problem was defining the project scope. Even though I had written other programs in Python, the scope of those problems was well-defined and the steps for solving them were reasonably straightforward. With this problem, I needed to define the problem statement myself, account for any edge cases, and figure out the right variables, data structures, and tools to use.
What I learned
This was an fun project to work on, and I learned some important lessons along the way:
Lesson 1: Clearly define what you want to design and know what you ultimately want to accomplish. Don’t try to code everything at once; consider writing a specific spec doc.
When I began my project, I had a vague idea of what the “desired outcome” looked like, but I did not start out with a project outline. After several failed iterations, I realized that I really needed a better strategy. I then wrote out a rough outline for what I wanted to accomplish with each “section” of the code in comments. This helped me understand “where I was” in the code and what I needed to write next.
For my next project I would absolutely begin by writing a plan for the different pieces of the project and deciding what (and ideally, how) I will tackle each part.
Lesson 2: Think of the different data structures you might want to use in your solution and how you will use them. Also consider where things might go wrong. Sometimes additional constraints can help you develop better solutions.
Initially, I planned to create a dictionary for each html tag that would include each word within that tag. Then I planned to count the occurrences of a desired keyword within each tag and sum all of the counts for each keyword. Unfortunately, this solution proved to be rather inflexible. It only allowed for me to search for a keyword comprised of a single word, and I knew that there would be occasions where I would want to search for a keyword with multiple words, such as “Python programming.”
Ultimately, I decided to allow the user to enter a specific keyword (which could be comprised of multiple words) and then look for that keyword (using regular expressions) in all of the tags. This solution allowed me to search for more than one word, which made the program more flexible and useful. Plus, it meant that I only needed to create one “master” dictionary with key value pairs comprised of the html tag and the number of keywords found within that tag. In the previous method, I needed to create a separate dictionary for each html tag. I think my final solution was less cumbersome.
Lesson 3: Know what tools are at your disposal and use the right tools for the job.
I started this project by trying to parse html documents with regular expressions. This was a mistake. There is a lot of potential variation in html, and regular expressions are too rigid to be used effectively in many cases. Instead I learned that it is better to use a dedicated html parser. I ended up using Beautiful Soup, which worked very nicely. I did use regular expressions to match the user-provided keyword within the document. Regular expressions worked in this situation because there is less potential for variation in a given keyword.
It was exhilarating when my program ran as expected. I felt a real sense of accomplishment when version 23 of the code actually worked, not that I was counting (smile). Not only did I get to use my new Python programming skills, but I created a program that could save me literally hours of manual analysis.
Never miss a post!
Subscribe to the blog and get updates every two weeks!