Tag Archives: NHL

Project: Corsi Database – Parsing NHL’s Play-by-Play HTML Game Summary Data

While we were parsing the larger details of the a Play-by-Play (PbP) HTML Game Summary, details on who is on the ice and who is taking shots requires a more detailed parsing of the file. To simplify my effort, I’m using the excellent tool called HtmlAgilityPack, which provides XPATH-like functionality to navigate within HTML files. With recent versions of Visual Studio, the HtmlAgilityPack is available as a NuGet package – so it’s a very simple add-in. Documentation, on the other hand, is a bit more precious to come by – I found trial and error and XPATH tutorials at W3Schools was best.

HtmlAgilityPack

HtmlAgilityPack

While I don’t know how fragile the NHL’s PbP HTML standard is, I set my goal to harden the parsing algorithm over 5-10 games. Tonight, I wrote the basic parsing algorithm and nearly made it through the first game of the season – when I crashed out in the second period with a null exception during the player parsing. But it’s a good place to stop because I made significant headway – the base code is pretty solid, the edges need work.

Parsing PbB using HtmlAgilityPack

Parsing PbB using HtmlAgilityPack

I think one more night and I’ll be able to go through one complete game, another night and we should have a polished NHL PbP scraper. Then we’ll refine our database so that we can hold the season data. These first few weeks will be hard because we have to build out our database before we can even do any analysis – but we’re making progress.

Advertisements

Project: Corsi Database – Step 1

The first step of the Corsi Database is complete. I’ve been able to parse the final game of the 2014 Stanley Cup, as per the data provided within the NHL’s game play-by-play document.

The goal of the Corsi Database is to provide Corsi-type of stats for the NHL 2014-2015 season. As a high-level goal, I’m trying to build out the infrastructure previously implemented by Extra Skater (Darryl Metcalf), which unfortunately is now offline.

Currently, I have this set up as a WinForm application, but will migrate to a ASP.NET MVC application, so that data is more readily accessible.

Phase 1 - Parsing NHL Play-by-Play