Experiences Using Systematic Review Guidelines

On Tuesday, Mahmood Niazi presented a conference paper of ours, Experiences Using Systematic Review Guidelines, at EASE 2006. Systematic reviews use an explicit and characteristic methodology, and are intended to be unbiased and replicable study of a specific research question. They have a quite different purpose than conventional literature reviews. Barbara Kitchenham had previously written some guidelines about performing systematic reviews in software engineering research. We used the guidelines while performing our systematic review, and our EASE 2006 paper commented on both systematic review and these guidelines.
Overall, we thought systematic review was a valuable approach for software engineering. Many papers in software engineering are experience-based case studies. Systematic review can synthesis results from all relevant case studies, revealing patterns and information not readily apparent in any single case study.
We also though Barbara’s guidelines were useful. However, we’d like to see more guidance on piloting protocols and assessing the quality of selected studies. The systematic review protocol is a plan about ow you’re going to search for, select, assess, and extract data from the primary studies that address your research question. Protocols are valuable, but it’s hard to know when to stop reviewing and piloting the protocol, and when to actually start!
Despite reviewing and piloting a systematic review protocol, it will inevitably change during the execution of systematic review. So, just as with the debates on agile vs. planned development methodologies, you might ask “Why bother writing a protocol in the first place?” We would say it’s worthwhile, for similar reasons as described in the classic paper A rational design process: how and why to fake it (also here) – it improves the final outcome and helps communicate it to others. In our final technical report, we showed the final idealised protocol because it helps other researchers to understand and replicate our study. However, we also showed the original protocol (as footnotes describing changes) so other researchers to be able to assess any initial bias in the original protocol or in our changes.
Choosing a narrow and well-defined research question is critical to being able to reliably select relevant studies. In order to clarify the scope of our research question, we found it useful to define complementary research questions, that were similar, but different, to our actual research question. So for example, our research question was, Why do organisations adopt CMM-based SPI?, and our complementary research questions included questions like:

Why do practioners support the adoption of CMM-based SPI?
How do organisations adopt CMM-based SPI?
Why should organisations adopt CMM-based SPI?
What benefits do organisations gain after adopting CMM-based SPI?

The complementary research questions form exclusion criteria for the first stage of selecting studies, while the real research question form inclusion criteria for the second stage.
Finally, it’s previously been noted that systematic reviews have a high effort, but we also found they had a long duration! In a systematic review, a group of researchers go through many rounds of independent work interspersed with joint meetings. It’s hard to schedule joint meetings even with only two researchers who have otherwise busy schedules…
It’d be good to see a central site for software engineering systematic review (like the Cochrane Collaboration in medicine), and also to see some attempts to independently replicate some systematic review studies.