Project: EcuaCines

Features:

Displays up-to-date movie descriptions, trailers, and showtimes for all major movie theaters in Quito, Ecuador
Allows users to quickly compare movie times instead of loading each theater's web page
Works on mobile devices
Loads faster than any of the corresponding movie theaters' websites
Has at least one Easter egg

Technologies:

Zurb Foundation, jQuery, Font Awesome, and Box2DWeb for the front-end
PHP with Simple HTML DOM and MySQL for the back-end, with cron jobs and Google Page Speed optimizations
Photoshop for the logo design

Discussion:

The website automatically obtains all movie showtime information from each cinema's official website, and then it has to be able to show these showtimes grouped by movie or by cinema. While this may seem like a trivial task, it turned out to be an interesting algorithmic challenge.

Human Steps:

Open each movie theater's website
Recognize that "Superman: El hombre de acero", " EL HOMBRE DE ACERO", and "Hombre de Acero" all refer to the same movie.
Copy and paste the movie title (pick one of the three variations) and description along with the showtimes for each theater into Ecua Cines's database

Robot Steps:

Open each movie theater's website
Find a piece of text that represents the movie's title by traversing each website's html tags according to hard-coded directions.
Recognize that "Superman: El hombre de acero", " EL HOMBRE DE ACERO", and "Hombre de Acero" all refer to the same movie.
Traverse cinema website according to hard-coded rules to find the movie summary text. (we only need to do this once for each movie)
Traverse cinema website according to hard-coded rules to find the movie times. Get each of the times by matching pieces of text with one or two digits followed by a ":" or an "h" and followed by two more digits. Assume a 24-hour time format.
Save the title, description, and showtimes for each movie as obtained in steps 3, 4, and 5 into Ecua Cines's database

How can we teach a robot to ignore the differences between the three strings of characters from step 3, but still differentiate these from other movie titles? This is what I ended up making the program do, and that has worked in practice:

Trim whitespace from the start and end of movie titles
Remove all accents from letters, e.g. turn all 'á's into 'a's (this accounts for the fact that many people prefer to capitalize "áéíóú" as "AEIOU" instead of "ÁÉÍÓÚ")
TURN THE MOVIE TITLES INTO ALL CAPS (this accounts for variations in capitalization)
Remove commonplace Spanish words like "EL," "LA," and "Y" (only keep 'important' words)
Match each movie title with the title from another theater with the shortest Levenshtein distance to this title, but requiring a maximum threshold to avoid false positives when there actually is no valid match. (this accounts for small misspellings and singular-plural variation e.g. "MONSTERS UNIVERSITY" vs. "MONSTER UNIVERSITY")
If the thresholded Levenshtein method finds no matches, match together movie titles that have a 8-or-more-letter common substring (there's a really cool dynamic programming algorithm for efficiently finding the greatest common substring of two strings)
If everything else fails, conclude that this movie is unique to this theater

Joaquín Ruales

Math + Code + Math.random()

Features:

Technologies:

Discussion:

Leave a Reply Cancel reply