RantFever 4

I pontificate but not in the pejorative sense of the word.

Crossword Puzzle Download

Posted 12 October 2014
Written by Abinadi Ayerdis
Category PHP

Across Lite

My favorite crossword puzzles are the New York Times crossword puzzles. They are sort of the standard of modern crossword puzzles, and the program that avid puzzlers use to solve them, when they aren't putting pen to paper, is called Across Lite. Unfortunately, the only version they have for Linux is 32 bit. Boo! And it didn't install on my computer. Double boo! Luckily, the program itself is straight forward enough to run in Wine without any issue. So I installed it under Wine. 

Cool. Now what?

Across Lite comes with a few default puzzles to check out. These are just demo puzzles and there are only a handful, so now I needed some puzzles. Preferably, I needed New York Times puzzles. Unfortunately, they do not give them away for free. In order to get access to the puzzles, you need a NYT subscription. Bummer. But maybe Google could point me to a viable alternative.

Google, thou art my hero.

After one second of Googling, I discovered that About.com has a repository of New York Times crossword puzzles. They are easily found here: http://puzzles.about.com/od/freeeasynytimescrossword/

Enter, About.com

Thank you, About.com, for having Monday-, Tuesday-, and Wednesday-level difficulty NYT crosswords. The only problem was that they have a separate page for each puzzle file. At least, I couldn't find a single bulk download of all the puzzles after a whole one minute of searching. However, I did notice that the files were all in the same directory, and they used an easy to parse name structure. So I could theoretically write a script that could download them all for me. So I did just that.

The Breakdown

Each puzzle file was named in this format: Mdy.puz where M = the first three letters of the month name, d = a two digit day, y = a two digit year. It is a date. Ex. May0106 or Oct1909. So all I need to do is loop through some dates and download a file named after each date. I noted that their earliest files were dated in May 2006 and their most recent files were in 2009. Which means that they had every Monday, Tuesday, and Wednesday level for almost four years.

The Script (in PHP)

// The date to begin: May 1, 2006
$day = new DateTime('2006-05-01');

// An array of days that we want to download
$days = ['Monday','Tuesday','Wednesday'];

// Where the puzzles are located
$url = 'http://puzzles.about.com/library/across/';

// This is to pass to the file_get_contents() function later so that we won't spend very much time on each attempt
$context = stream_context_create(
		['http' => ['timeout' => 1]]
	);

// Here is the loop we are going through. I chose a do-while so that the check would be at the 
// end of the loop instead of the beginning.
// This is not important, really any loop would do.
do {
	echo $day->format('Mdy'); // So I could see each day as it was considered.
	$filename = $day->format('Mdy') . '.puz'; // Construct the name of the file we are going to download
	
​	// Check to see that the current day being considered is one of the days we want
	if(in_array($day->format('l'),$days)) {
		// Here is the magic. The actual download.
		$puz = file_get_contents($url . $filename, false,$context);

		// Sometimes, they missed days. When that happened, there is a redirect. This check is to make sure we got an actual puzzle.
		if($puz && substr($puz,0,9) != '<!DOCTYPE') {
​			// Now create the file on my hard drive. Note that it puts the Monday files in a Monday folder, Tuesday in a Tuesday, etc.
			file_put_contents('puzzles/'.$day->format('l').'/'.$filename, $puz);
			echo ' - Checks out<br>';
		} else {
			echo ' - Skipped<br>';
		}
	} else {
		echo ' - Skipped<br>';
	}

	// Move to the next day.
	$day->modify('+1 day');
} while($day->format('Mdy') != 'Oct1909'); //The last one we check is Oct 19, 2009

I used PHP because I know it. This was a pretty basic script that would be easy to write in any language. It gave me some four hundred files, give or take. It took me no time at all to write, but a while to run. But after half an hour or so, I am puzzling it up!

Comments

There are no comments

Posting comments after three months has been disabled.