Tag Archives: Corsi

Project: Corsi Database – Parsing NHL’s Play-by-Play HTML Game Summary Data

While we were parsing the larger details of the a Play-by-Play (PbP) HTML Game Summary, details on who is on the ice and who is taking shots requires a more detailed parsing of the file. To simplify my effort, I’m using the excellent tool called HtmlAgilityPack, which provides XPATH-like functionality to navigate within HTML files. With recent versions of Visual Studio, the HtmlAgilityPack is available as a NuGet package – so it’s a very simple add-in. Documentation, on the other hand, is a bit more precious to come by – I found trial and error and XPATH tutorials at W3Schools was best.



While I don’t know how fragile the NHL’s PbP HTML standard is, I set my goal to harden the parsing algorithm over 5-10 games. Tonight, I wrote the basic parsing algorithm and nearly made it through the first game of the season – when I crashed out in the second period with a null exception during the player parsing. But it’s a good place to stop because I made significant headway – the base code is pretty solid, the edges need work.

Parsing PbB using HtmlAgilityPack

Parsing PbB using HtmlAgilityPack

I think one more night and I’ll be able to go through one complete game, another night and we should have a polished NHL PbP scraper. Then we’ll refine our database so that we can hold the season data. These first few weeks will be hard because we have to build out our database before we can even do any analysis – but we’re making progress.

Project: Corsi Database – Initial Database Seed and Grid Display

Seeding the database with test data

With the initial database model defined, I created a seed method to load some sandbox data – this seed method is referenced in the system start-up and will create and populate the database if there is nothing present:

Corsi Database Initialization

Corsi Database Initialization at System Start

Corsi Database Seed Method

Corsi Database Seed Method (called if the database does not exist)

Of course, this whole seeding process is just to kick-off the project, so that we have a sandbox of information to test and view. In the future, we’ll add game data every night as it becomes available.

Getting to a View of the Data

We are using the Model-View-Controller (MVC) pattern. The Model is coming together nicely now that we have data in the database. We need to create the Controller and View to allow us to navigate and display the data.

I ran into a bit of a snag when I couldn’t get Visual Studio to auto-generate this for me because I was using MVC 4 with Entity Framework 6, so after figuring out how to upgrade, I was able to get Visual Studio to build out the scaffolding for me. I had to use the Nuget Package Manager’s Update section to bring in the necessary MVC 5.x upgrade package.

I then used Nuget to install Grid.MVC to start making the data look pretty. I needed to bring in Bootstrap and JQuery to round out the formatting.

I added a logo and voila, we have our first web browser accessible view of the base data that we’ll use to compute Corsi stats.

Initial MVC GameEvents View

Initial MVC GameEvents View

Project: Corsi Database – Moving to ASP.NET MVC

With the eventual plans to allow a browser accessible interface to Corsi data, I’ve started the migration of the previous WinForm application to a ASP.NET web application, built using the MVC framework. To do this, I’ll be using Visual Studio Express 2013 for Web development, Entity Framework for data management and SQL Sever Express Edition 2014 as the database host. Once I get this working on machine, this should provide a low-resistance path to move to Azure hosting.

After creating the basic MVC project, I’ve used the Nuget Manager, to add in the smarts for the Entity Framework – this will create an interface to the SQL database that I can work with in code.

Entity Framework

Entity Framework

My database will change, but just to get started, I’ve used the Model First approach to rig up the very basic database.

MVC Model First

MVC Model First

With this in place, our next steps is to seed the data and show the basic information using a MVC View Controller.

Project: Corsi Database – Step 1

The first step of the Corsi Database is complete. I’ve been able to parse the final game of the 2014 Stanley Cup, as per the data provided within the NHL’s game play-by-play document.

The goal of the Corsi Database is to provide Corsi-type of stats for the NHL 2014-2015 season. As a high-level goal, I’m trying to build out the infrastructure previously implemented by Extra Skater (Darryl Metcalf), which unfortunately is now offline.

Currently, I have this set up as a WinForm application, but will migrate to a ASP.NET MVC application, so that data is more readily accessible.

Phase 1 - Parsing NHL Play-by-Play