For my third project at Metis Data Science Bootcamp I decided to attempt to predict the rain with classification. The outcome of these two weeks is called rainOne, a classification algorithm that compares my prediction against the weather prediction of OpenWeatherMap.org.
Meteorologists generally determine the forecast by looking at multiple sources of data. They look at movements of pressure systems, analyze cloud cover, and aggregate information from many different sensors and stations. What if they only had the information that one individual weather station provided? Can the weather still be predicted with only a station’s record of the past conditions? …
Have you ever watched a trailer for a movie and saw an actor or actress that made your desire for the film increase? What about seeing your favorite director on the roster for a film? For my second project at Metis, I tried to quantify that feeling to predict the success of a movie.
Data for this project was scraped from BoxOfficeMojo. For this project, I focused on domestic film rankings. I gathered links to each movie on each yearly list from 2000 to 2019 on BoxOfficeMojo’s yearly table. On an individual movie’s landing page, I used BeautifulSoup to scrape the basic details. To scrape the cast and crew information, I needed to use Selenium to click through the page to open those informations. …
This week I was fortunate enough to work with some peers on exploratory data analysis (EDA) looking at the publicly available Metropolitan Transportation Authority (MTA) Turnstile Data from New York City. Our mission was to analyze the data to provide insight to an imaginary entity WomenTechWomenYes (WTWY) on where to solicit email engagement for an upcoming Gala. Before we could do any recommendation, we had to engage with the data.
The dataset is reported in weekly updates containing roughly 200,000 rows of data. A quick trip to their site helps to decode the information from provided on each row.
Each row of the dataset represents the recorded by a singular turnstile. Individual turnstiles have unique identifiers built from a combination of its Station, Remote Unit, Control Area (C/A), and Subunit Channel Position. To enter or exit a subway station, one usually needs to pass through a turnstile. The turnstiles record a running tally of the number of entries and exits once every four hours. …