AI research involves software development and all software development involves regular and extensive testing. Along with the AI curriculum I've been working on for years now (see my blog of 10 April 2015 and references therein) one would want some suitable test sets.
I believe that intelligence, values/utility, consciousness,.... are all complex vector quantities. Although a single test can certainly try to measure more than one quantity, still, it seems likely that more than one test might be needed in order to gauge an AI's overall performance. Jia You recently described how the Turing test might be replaced by a battery or suite of tests (Science, 9 Jan. 2015, pg 116).
In testing my own code I typically start with simple, and then more complex, logic functions (see chapter 1 of my book, Twelve Papers, for example). For me, a follow on test is often times character recognition. But where should one go from there? I think a good test suite can only be developed in conjunction with the AI curriculum. Perhaps the school of "test first" software development would have us create the test suite first and then the AI curriculum. In designing an intelligence I would think that the opposite might be more reasonable, or, perhaps, working through both curriculum and tests in an iterative fashion.