It is accompanied by
accio, a web-based, closed-source front-end, for which
pensieve serves as the backend. If you do not want to use hosted or closed-source software, not to worry:
accio is meant as a convenient way of accessing
pensieve, for users who may not want to install and learn R. Almost all features accessible in
accio are also available in
pensieve, with the exception (for now) of deploying a web-based Q-Sort.
Though formally a package like many others,
pensieve is in some ways less and more than mainstream packages that you may be familiar with and on which it is largely based. For now, at least,
pensieve offers little advanced algorithms by itself, but merely wrangles data and interface other, powerful packages. This makes it somewhat less than a “real” package, and more of a wrapping layer. At the same time,
pensieve is vastly more specialized than many other packages, tailored to the needs of one domain: the scientific study of human subjectivity. This makes it akin to a Domain Specific Language or a mini-language.
pensieve, is, in other words, a very nich enterprise, much like Q Methodology itself: endowed with great hopes by its creator, and perhaps some yet-unrealised potential.
Good software for the following Q methodological analyses should fulfill the following criteria:
The following free / open source and closed / proprietary programs are available:
|Closed Source||SPSS, STATA||PCQ|
|Open Source||R||R: qmethod package, PQMethod|
The closed-source, general-purpose programs offer some programmatic extensibility, but are poorly suited for Q methodology. These commercial, and often expensive offerings also make it hard to reproduce and open up research. The special-purpose
PQMethod was originally written for mainframe in 1992 by John Atkinson at Kent State University in Kent, OH, and has since been ported for PCs and generously maintained by Peter Schmolck at the University of the Armed Forces in Munich, Germany. While nominally free and open-source,
PQMethod consists largely of legacy FORTRAN code, which few people today can read or write.
PQMethod also offers little programmatic extensibility, because it is a standalone program running within layers of emulators.
qmethod package (Zabala 2014b), implementing Q methodology in the free and open source R programming language and software environment for statistical computing (R Core Team 2014) is the best fit for the above criteria. R is freely available for all platforms, and supported by a vibrant community of developers.
Qmethod, while released only recently, has been thoroughly validated (Zabala 2014a), leverages existing packages for extraction and rotation and conveniently wraps specialized Q functions. Running inside the R environment,
qmethod can be easily extended, now including several functions developed for this dissertation and contributed back to the open source project. Most importantly, using
R, every step from raw data to final factor interpretation can be traced back, using publicly available and vetted code, down to base functions.
Machine-readable code, its results, and human-readable explanation of what that code does are often written and stored separately. Not only does such separation invite mistakes when code, result and explanation diverge, but it also makes research harder to reproduce and is conceptually flawed. At least in the context of statistical analysis, what we want a machine to do, why we want it, and to what effect are one intellectual operation, and should be presented as such. Programs must not be considered black boxes that “do” things of their own accord, to be explained ex-post, but code should be a near equivalent of the same thought expressed in prose.
Donald Knuth has formulated this central tenet of his Literate Programming approach thus:
Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to humans what we want the computer to do. — Knuth (1984)
pensieve supports this approach by offering plot and print methods suitable for inclusion in prose via the
knitr R package (Xie 2015).
One of the dirty secrets of quantitative work is that a good 80% of the time is often spent on “cleaning and preparing” messy data (??? as cited in Wickham (2014a): 1). This tedium affects the quantitative stages of studying human subjectivity as much as other analyses, and is best addressed head-on and as early as possible, to avoid downstream complications.
To that end,
pensieve prescribes a large number of standardized data formats, in which otherwise messy data can be tidily stored.1 These standardized data formats come in the form of S3 classes, which is the simplest (and oldest) system for object-oriented (OO) programming in R. There is no need to know anything about S3 or OO in general to use and benefit from this system in
pensieve: Suffice it to say that S3 is a clever way for data to know what kind of data class they are, and what kind of methods can be applied to them. For example,
psLoadings “know” that they are loadings, that they must be between
1, and that they can be
It is easy to recognize theses classes in
pensieve: they are all in CamelCase and start with a
ps, as, for instance
psItems. They all come with a constructor function, always by the same name (
psItems()), which allows you to create such an object from appropriate inputs. These constructor functions do not do very much; they simply validate the data and assign the class.
Each class (such as
psLoadings) is also accompanied by a method for the
check() generic, which means that you can, well,
check() any object to make sure that it conforms to the standardized data format. For example,
psLoadings object will assert that all values are between
1, as loadings must be by definition, as well as a number of other criteria. To find out about these criteria for valid data, consult the documentation for each of the classes, by entering, say
help("psLoadings"). The arguments section will list all of the criteria that are also tested. If your data is partly invalid, the
check()s will give you (hopefully) precise and instructive error messages for you to fix.
check()s are run on whenever
pensieve touches any of its known classes, which ensures that data are always in their proper form. This is a simple way to shoehorn type validation into S3, which ordinarily does not have such a facility.
In addition to the
check() functions (which return
TRUE or the error message), you can also use
assert() (which only returns a message in case of an error),
test() (which only returns
expect() (for internal usage in testing via (???)’s
testthat). This follows the framework set by (???)’s
If you only use the functions provided by
pensieve, you will probably never need
check() or any of its siblings:
pensieve will do all necessary checking for you. If, however, you add some custom code of your own, you may want to throw in an occasional
assert()ion into your script to ensure that everything is still kosher.
check()s, there are some other methods defined for common generics, defined for some of the objects, such as customized
plot()ing functions. You can find out about all these special “abilities” of the classes by reading their help.
A word of warning: you can always assign or remove classes by hand, rather than using the provided constructor functions and their included
check()s, but I strongly recommend against this. Because S3 is a relatively simple and ad-hoc system, it is easy to “shoot yourself in the foot” (Wickham 2014b). Downstream
pensieve functions may fail, or worse, produce wrong results if data is not provided in the precise format specified in the included classes.
This design may strike users as involved, or overly restrictive, but there is little need to worry. Most average users of
pensieve will never be exposed to the OO system “behind the scenes”. More advanced users with custom extensions should also not feel stifled; rather than restricting the extensibility of R, this validation infrastructure merely provides a well-defined interface for extending
pensieve. With such a system in place, all users can rest assured that all reasonable precautions are taken to ensure the reliability and reproducibility of their results, as would be expected from any scientific study of human subjectivity.
In addition to atomic (or vector) data formats, such as
psLoadings which cannot be divided up further,
pensieve also provides a lot of composite (or list) data formats, which comprise of atomic classes, or other list classes. This is easy to illustrate with
psItems, which must always consist of a
psItemConcourse with the full text, but may also contain a
psItemSample, selecting a few items from the concourse.
Higher-level classes such as
psItems also come with their constructor (
check() functions, but these expect the lower level, atomic classes as inputs. Instead of checking their validity, they check the consistency between them. For example, we know that
psItemSample may only contain items for which there is also an entry in
psItemConcourse; it would not make sense to have an item that is part of the sample, but not the concourse.
This structure of nested classes goes all the way up to a complete
psStudy object, which includes all data relevant for, and produced in any given study.
The top-level list
psStudy looks like this:
psStudy: The entire study object.
psItems: Additional information on the items, including their full text.
psDesign: Parameters of a study, such as the shape of the sorting distribution.
psPeople: Additional information about the participants in the study.
psSorts: The sorts.
This is a fairly long list, although most users will not use all classes, and leave many entries blank. In fact, only very few of the elements are necessary to use the central analytical functions of the package (typeset in bold).
In addition to the hierarchical structure of the nested classes, you will also find some subclasses (typeset in italics). Subclasses are used, when there is a variant of a parent class, such as when the
psConcourse consists of images (
The order of this list approximately follows the progress of most Q studies, and is also reflected in the organisation of this book. Each of the high-level list objects, including their elements, functions and substantive considerations is covered in the following chapters.
However, there is no technical reason to follow this order or structure of classes; users can organise their data in whichever form they prefer. Some functions accept higher-level, nested lists, but they always also accept the atomic form.
In addition, not all Q studies will have use for all of these objects, or their elements: the nested data structures offered in
pensieve are, for the most part, strictly optional. It is, in fact, possible to run
pensieve with only a minimum of these objects, emphasized in the above.
Knuth, Donald E. 1984. “Literate Programming.” The Computer Journal 27 (2).
Leek, Jeff. 2016. “Non-Tidy Data.” Blog. Simplystats. https://simplystatistics.org/2016/02/17/non-tidy-data/.
R Core Team. 2014. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing.
Wickham, Hadley. 2014a. “Tidy Data.” Tidy Data 59 (10).
———. 2014b. Advanced R. 1 edition. CRC. Boca Raton, FL: Chapman and Hall/CRC.
Xie, Yihui. 2015. Knitr: A General-Purpose Package for Dynamic Report Generation in R.
Zabala, Aiora. 2014a. “Qmethod: A Package to Explore Human Perspectives Using Q Methodology.” The R Journal.
———. 2014b. Qmethod: An R Package to Analyse Q Method Data. Cambridge, UK: University of Cambridge.
Wickham (2014a) has suggested a formal definition of tidy datasets where each row is an observation, each column a variable and each table an observational unit. While much of
pensieve follows this philosophy, not all Q objects are best represented as tidy
tibbles. Where rows and columns can be meaningfully transposed or where linear algebra operations are readily applicable, as is the case for Q-sorts and downstream results, data is instead stored in matrices and higher dimensional arrays. This is in line with recent recommendations for such non-tidy data by (Leek 2016).↩