Hello, in this blog I will be explaining my experience creating a CLI Gem.
With Adobe Flash no longer being supported, I remembered a website that utilized it heavily. Newgrounds hosted many great Flash movies and games, so I decided to use it as part of my simple scraper project, which would display the featured movies section.
To begin, I reviewed many previous labs for info, such as scraping with Nokogiri, and watched tutorials on how to get started with a CLI gem. Once I had the basic setup and knowledge, I began playing around with Nokogiri in order to see how hard it would be to accurately scrape the website. Initially I tried to scrape individual elements regarding each movie, but could not seem to accurately pinpoint all the info I wanted, acquiring long strings of useless text.
Eventually, as I got more acquainted with Nokogiri’s search protocol, I was able to gather portions that were only halfway useful- a string of text containing both the title and the author and another with the description and genre, each with formatting issues. This would not do as I only wanted my description within the description array, creator within the creator array, and so on. I easily could have spent more time trying to get the perfect HTML search to pinpoint the exact string I need, but I decided to try something else.
Once I acquired the data that contained both the title and author, I would then split it up into its own array. This allowed me to combat the formatting issue, as I could remove any whitespace or unnecessary character by searching the array and deleting those elements. This would then leave me with an array of the title on the even indexes and authors on the odds. I would loop through this array, check if it was odd or even, and add it to the appropriate array based on the result. I repeated this process for the genre and description as well.
However, it was not entirely this simple. While I was trying to remove whitespaces and such, some would stick around within the array of both title and author. This wouldn’t do as the index was vital in separating the title and author to their own arrays. I tried checking for issues with my whitespace search, but ended up realizing there was a pattern to exactly where these whitespaces would stick around even after removing the majority. This pattern allowed me to move around the remnants of whitespace and create the appropriate arrays by explicitly stating “at index of 4, add ___ element to (appropriate array).”
After figuring out how to acquire the correct info, I would reference it within the class regarding the CLI at the appropriate points. I created a basic menu loop that would display the titles and authors first, and the description/genre/url based on user input.
I learned a lot about the flow of creating a CLI Gem, parsing data in order to delineate it correctly, and became much more acquainted with scraping.