Performance-based tests possess a long history in applied psychology. Performance tests first advanced the science of psychology in brass instrument laboratories, ability, and intelligence testing.  Although performance testing varied over a wide range of tasks, a common denominator was the presence of an independent recorder, usually other than the person, in the form of a kymograph or instrument, and , as in intelligence testing, an examiner. Individual differences in approach, process, and timing could be observed and recorded. This led to a plethora of observation, data, and conjecture.

Performance testing is superbly suited to the study of complex processes, individuality, and hypothesis generation.  This is because the field of observation is limited to a single subject at a time and constrained only by the task and the recording instrumentation.

Evaluation of performance testing may be criterion, standards, or normatively based. Performance testing may be task-oriented or situational, narrowly or broadly defined, task, time-frame, and content or situation may be infinitely varied.

Well conceived performance tests provide the “gold standard” for psychological research and assessment. “Well conceived” means that the tests possess sound “construct” validity and do what they purport to do. Well conceived performance tests are found in psychophysiology, neurology, neuropsychology and intelligence testing.

The better the control and the finer the detail of recording and observation, the more likely performance tests will yield useful or conceptually significant results. Current neuropsychological investigations that employ fMRI are performance measures par excellence. fMRI investigations employ single subjects whose behaviors or responses are observed and recorded in the detail which current technology permits. In science, measures that do not stay abreast of current technology and take advantage of the greater precision or detail that technology provides are replaced.

Earlier Generation Performance Tests of Personality

Turn-of-the-century personality theories inspired performance testing that aimed at eliciting unconscious cognitive or emotional biases that influenced how the respondent constructed perception and meaning. Performance tests employing the symmetric ink blots or drawings depicting emotional evocative situations served as inputs for the classical projective tests, the Rorschach and the Thematic Apperception Test (TAT).

For decades, these tests were described as “projective” instruments with the idea that distortions and determinants that were “unconscious” to respondents would find their way into their responses. Instead of being known as performance tests, which they are, the tests were christened “projective” to reflect the author’s theoretical assumptions underlying their interpretation. These assumptions, whether true or not, placed a heavy burden upon these tests to demonstrate some unbiased connection between test results and deeper “unconscious” dynamics. Results of these endeavors were fraught with methodological conundrums. Results were spotty and hard to replicate.

As is well-known, the traditional tests, the Rorschach and TAT, often were plagued by loose standards of administration, codification, and ad hoc interpretation. Yet for all the technical issues that rendered results from these tests dubious or difficult to replicate, the Rorschach and TAT permitted unique responses which related to respondent’s unique life experience. Respondents were not forced to adhere to an imposed language to characterize their perspective. They used their own language. Respondents were not requested to self-report. Their responses were observed and recorded by an examiner who made an independent evaluation by reference to external criteria or norms.

Originally the Rorschach and TAT were given in a relatively informal manner which allowed easy interaction between examiner and respondent. Timing was not critical. The tests were “open-ended” with no firm “upper limit” to responses. Therefore, a single Rorschach test might vary from a few percepts to a hundred or more and take from 30 minutes to several hours to administer. The TAT was given over a two-day period with half the cards on one day and the other half on the day following.

The Rorschach

The Rorschach greatly appeals to clinicians due to the clarity and simplicity of its task demands, “What might this be?” with ten cards handed in sequence. No doubt its ease of administration has much to do with its popularity, but so too does observation of the respondent’s unique style and the opportunity for informal observation that accompanies its administration.

During its history, the relatively standard administration of the Rorschach nurtured research and the development of coding systems eventuating in John Exner’s synthesis of the Rorschach literature and systems.  Meyer, Viglione, et al’s ( 2011) recent systematization of the Rorschach furthers Exner’s project of establishing a firm and enduring foundation for Rorschach assessment.  Testing has been standardized to limit the number of responses to a maximum of 40 with the approximate testing time reduced to one hour or less. Interviewer interventions are strictly defined. Seating is side-by-side. Assumptions about the unconscious are kept to a minimum with the test employed to evaluate individual differences in accord with clinical observation. The revisions to Rorschach administration and scoring proposed by the authors address many of the criticisms directed at the Rorschach. The test has been rendered more reliable and valid.

Yet for all these changes, the Rorschach is a test that focuses upon “products” of perception, that is, responses are taken as “percepts” to be coded according to content, form, and other features. The “sums” of content, form, and other features, taken percept by percept, yield “overall” scores which provide an individual profile. Personality is a holistic construct, yet the Rorschach, along with the majority of psychological tests, attempts to explain or illuminate the personality as a whole by reference to characterization of isolated elements. In this sense, the Rorschach is reductive.

This is not so much a criticism as an observation. In important respects, much of science is reductive, but a question remains, in what way is a test reductive and how reductive? If observation is limited to independent percepts, then relationships that might emerge through broader examination or a more inclusive range of observation become invisible. Cell biology fails to characterize or describe social behavior –although once social behavior is described, cell biology may illuminate mechanisms of social behavior. The whole is more than the parts.

The Thematic Apperception Test

Children, adolescents, and adults tell stories. Stories provide frames of reference for personal and social orientation. Stories are organizers. The Thematic Apperception Test, and kindred measures, aim to elicit these “frameworks of meaning” by exposure to dramatic pictures and asking subjects to tell as dramatic a story as they can for each picture presented, including:  what has led up to the depicted event; what is happening at the moment; what the characters are feeling and thinking; and what was the outcome of the story. If the Rorschach accentuates individual percepts, the TAT, at least at the level of data collection, evokes broad-based schema.

The TAT particularly appears valuable for evaluation of the respondent’s “meaning making”, that is, how the respondent links together cause and effect, effort and consequence, aim and outcome. The TAT can address, at least conceptually, how the person organizes the elements of experience into a whole.  Yet since the 1960s, the TAT has languished as an assessment tool. The TAT has languished not only due to the technical problems already mentioned, but due to its arduous requirements. Respondents must write their stories, addressing examiner questions, or the examiner must transcribe their stories with recommended testing spanning multiple sessions. Given these requirements, rather than systematic assessment, cards from the TAT usually are selected by examiners for their clinical value. While providing clinical fodder, this piecemeal approach has not furthered scientific use of the instrument.

The Music Apperception Test

The Music Apperception Test (MAT) is a new generation performance-based test of personality. While the test has the advantages of the relatively open-ended traditional “projective” tests, it lacks their limitations.  Instructions are pre-recorded, so differences in instructions due to examiner bias are eliminated. The beginning and end of each segment of the test, and the overall length of the test itself, are predetermined. Everyone who takes the test experiences a standard set of exposures. The test stimuli are continuously changing with periods of “respite” between “trials”. Attuned performance requires mirroring and interpretation of changing stimuli in accord with task demands, “to tell a story”.

The test emerged as a new generation test because inexpensive and readily portable gear allows administration and recording of response in real-time in diverse settings.  Real-time recording permits verbatim records and timing of response. Precise latency, response to musical change, fluency during music, fluency after music cessation, pace of speech, rhythmic attunement, and variance in these measures within and between compositions may be calculated. These measures give a picture of overall responsiveness and “flow” not equivalent to scores for individual percepts or their sum.

Similarly for MAT coding, the MAT includes categories for general stylistic features of response. Among these are narrative style and affective fit.  Narrative style identifies whether the person weaves a coherent plot-based story, or voices descriptions of sensations, images, and movements, or an admixture of both.  The respondent who voices sensation and images tells a story, but by reference to a shifting changing kaleidoscope of contemporaneous experience. Their approach is radically different from respondents who cleave to plot-based stories. The same “content”, assessed by reference to individual words, has different meaning depending upon narrative style.  Affective fit is the overall mood or feeling conveyed by a story or description. The vast majority of healthy individuals adapt their imagery or plot-based stories to the mood of the music as reflected in their narrative pace, imagistic or plot-based outcome, and other variables. Again, affective fit does not inhere in isolated percepts, but in the overall narrative.

Correlatively, recording in real-time does not exclude a reductive approach, for example, examining responses word by word. The MAT Manual includes item codes for word content and social interaction (toward, against, and away). The MAT allows application of word dictionaries and other mechanical approaches to codification. The format permits stress and temporal frequency analysis of voice pitch and amplitude. The data base provided by real-time recording of a strictly timed test with standard inputs constitutes a resource for multiple approaches to research and assessment.

The MAT is a new generation performance-based personality test. It is a real-time test oriented to the ebb and flow of affect, speech, and verbalization. Unlike the Rorschach, MAT responses may be coded for affective attunement and narrative style –inclusive patterns of response that contrast with item-by-item codification of percepts. Unlike the TAT, the MAT is strictly timed with the Short Form A or B taking less than eight minutes. Yet, like the TAT, the MAT provides information about the general frames of reference the respondent brings to bear on emotionally challenging life experiences. The MAT moves testing from preoccupation with static inputs to dynamic real-time behavior and adaptation.


Meyer, G.J., Viglione, D.J., Mihura, J.L., Erard, R.E., & Erdberg, P.  (2011). Rorschach Performance Assessment System: Administration, Coding, Interpretation, and Technical Manual. Toledo, OH: Rorschach Performance Assessment System, LL.C.

Contributed by Leland van den Daele

Leland van den Daele

Leland van den Daele, PhD, ABPP, is author of the Music Apperception Test. He is a psychoanalyst and Emeritus Professor of Psychology at the California Institute of Integral Studies, San Francisco, CA.

Related Posts


On the Decline of Projective Techniques in Professional Psychology Training

As the following article suggests, individual personality testing is employed by experienced clinicians who work with real people. Regretfully current psychology trainees receive less training in individual personality testing. The author identifies several influences for this trend. Read more…


Performance-based Personality Test Links

Psychodiagnostics seeks links to well validated, performance-based personality tests and measures which relate to basic emotional, motivational, and interpersonal dynamics or behavior. Use Psychodiagnostic’s Contact Us form to make your recommendations. [wp-blogroll]


Situational Tests, Free Association, and the Music Apperception Test

The “situational test” is generally viewed as the Gold Standard for psychological assessment.  As a general rule, the closer a test to the actual behavior to be predicted, the more accurate is extrapolation from the Read more…