Recently, 30 observational methods for assessment of biomechanical exposures at work were evaluated through a literature review (Takala et al., 2010). It was found that several methods are insufficiently tested in terms of validity and reliability. In only in a few cases have estimates of the components of the methods been validated against technical measurements. Also comparisons between methods' resulting risk levels are rare. Swedish Work Environment Authority has recently increased the demands on ergonomic risk assessments. These assessments are usually made by ergonomists in occupational health services (OHS).
This study is included in a larger on-going project, with the overall purpose to evaluate six observational methods for assessment of biomechanical exposures of repetitive work in respect of validity, reliability and usability, as well as provide information on which of the methods are best suited for practitioners in risk assessment of repetitive work. The methods' resulting risk levels are compared not only to each other, but also to ergonomists’ “own” risk estimates (i.e., done without any specific method). The specific aim of this sub-study was to investigate the inter-observer reliability of ergonomists’ own risk estimates.
Nine OHS-ergonomists, all with more than 5 years of experience of general ergonomic risk assessments, made risk assessments of 10 different video-recorded (2-6 minutes) work-tasks (supermarket work, meat cutting and packing, engine assembly, cleaning, post sorting and hairdressing). Video sequences of two or three camera angles were synchronized and showed together. For each work-task, the ergonomists were given data of the work task length (see Table 1), pause- and rests schedules, weights of handled goods, physical factors, and the employees own ratings of discomfort, work demands and own control.
The ergonomist could pause the playback as needed, the maximum allocated time per work-task assessment was 20 minutes. The risk of musculoskeletal disorders and need for improvements was rated into green (no risk), yellow (investigate further), and red (immediate risk) categories. Ratings were done for 8 specific body regions: neck, lower back, right and left shoulders, -arms/elbows, and -wrists/hands), and for one over-all risk level.
The agreement of the ratings (in percent), and Light’s multi-observer kappa (i.e. Cohen’s pairwise kappa averaged over all pairs; Light, 1971; Cohen, 1960) were calculated per body region and for the over-all risk assessment.
For the 720 (9 ergonomists, 8 body regions, 10 work tasks) risk assessments of the separate body regions, 37% were green, 44% yellow and 19% red. For over-all risk assessments (Table 1), 14% were green, 50% yellow and 36% red.
Table 1. Work tasks, hours per work task per work day, and the ergonomists ratings of over risk
As seen in Table 1, the consistency between the observers differed markedly. For three of the work tasks all three categories were represented, only in one task all ergonomists rated the over-all risk equally. The average agreement of the ratings were 48% regarding the body regions, and 57% regarding the over-all risk assessments, Light’s kappa was 0.18 and 0.30, respectively.
The results showed fair inter-observer reliability according to Altman’s table for interpretation of kappa (kappa between 0.21 and 0.40; Altman, 1991). These kappa values will, in the major project, be compared to those of six systematic observation methods.
Altman DG (1991) Practical statistics for medical research. London: Chapman and Hall.
Cohen J. A coefficient of agreement for nominal scales. Educational and Psychological Measurement. 1960;20(1):37–46.
Light RJ. Measures of response agreement for qualitative data: Some generalizations and alternatives. Psychological Bulletin. 1971;76(5):365–377.
Takala EP et al. 2010. Systematic evaluation of observational methods assessing biomechanical exposures at work. Scand J Work E