How to Display Data- P10

Chia sẻ: Cong Thanh | Ngày: | Loại File: PDF | Số trang:5

Thêm vào BST

Báo xấu

75
lượt xem 4
download

Download Vui lòng tải xuống để xem tài liệu đầy đủ

How to Display Data- P10:The best method to convey a message from a piece of research in health is via a fi gure. The best advice that a statistician can give a researcher is to fi rst plot the data. Despite this, conventional statistics textbooks give only brief details on how to draw fi gures and display data.

Chủ đề:

Bình luận(0) Đăng nhập để gửi bình luận!

Lưu

Nội dung Text: How to Display Data- P10

Displaying quantitative data 37 50 40 Frequency 30 20 10 0 1.40 1.46 1.52 1.58 1.64 1.70 1.76 1.82 1.88 1.94 2.00 (c) Height in metres Figure 4.6 (Continued.) obvious: there are more women than men; and the peak for men occurs at a greater height than for women (about 1.80 m compared to 1.62 m). The bins or intervals on the horizontal X-axis of the histogram can be labelled in a variety of ways. The bars may be labelled by using the mid- point of the corresponding interval, or by having a label at the start (or end) of the interval as in Figure 4.6. For histograms, we recommend that you label the horizontal axis, at the start (or end) of each interval, since with this method it is easier to work out the width of the interval (as in Figure 4.6). Some intermediate interval labels can be omitted, to avoid cluttering up the scale, without any noticeably loss of clarity as in Figure 4.6b. A useful feature of a histogram is that it is possible to assess the distribu- tional form of the data; in particular whether the data are approximately Normally distributed, or are skewed. The Normal distribution (sometimes known as the Gaussian distribution) is one of the fundamental distribu- tions of statistics, and the histogram of Normally distributed data will have a classic ‘bell’ shape, with a peak in the middle and symmetrical tails, such as that for height for women in Figure 4.7b. Skewed data are data which are not symmetrical; positively skewed data have a peak at lower values and a
38 How to Display Data 50 40 Frequency 30 20 10 0 1.41 1.47 1.53 1.59 1.65 1.71 1.77 1.83 1.89 1.95 (a) Height in metres 50 40 Frequency 30 20 10 0 1.41 1.47 1.53 1.59 1.65 1.71 1.77 1.83 1.89 1.95 (b) Height in metres Figure 4.7 Separate histograms for the heights of men and women:3 (a) for men (n 77) and (b) for women (n 145).
Displaying quantitative data 39 200 150 Frequency 100 50 0 0 50 100 150 200 250 300 350 Baseline ulcer area (cm2) Figure 4.8 Positively skewed data – histogram of baseline ulcer area (cm2) from leg ulcer trial (n 217).3 long tail of higher values (Figure 4.8) while conversely negatively skewed data have a long left-hand tail at lower values, with a peak at higher values (see Figure 4.9). Histograms are similar to bar charts in that the variable of interest is dis- played on the horizontal axis (X-axis) and the frequencies are displayed on the vertical axis (Y-axis). However bar charts are used for discontinuous data, where the categories are entirely separate while histograms are used for continuous data. Thus bar charts have gaps between the categories on the horizontal axis in order to emphasise that the categories are completely separate, whereas there are no spaces in between the bins for a histogram, as the width of these bins can be set by the investigator. The count data, for the number of deaths from SIDS per day, in Table 4.1 could also be displayed as a histogram. This is because there are a large number of categories (14) of deaths per day and it is reasonable to treat such discrete count data as if they were continuous, at least as far as the sta- tistical analysis goes. However we would recommend count data should be displayed using bar charts as opposed to histograms, as the gaps between the bars will emphasise that the categories represent discrete whole num- bers and cannot take intermediate values (e.g. it is not possible to have 1.3 SIDS per day).
40 How to Display Data 80 60 Frequency 40 20 0 0 20 40 60 80 100 SF-36 Social functioning: baseline Figure 4.9 Negatively skewed data – histogram of baseline social functioning from leg ulcer trial (n 233).3 4.6 Box–whisker plots Another extremely useful method of plotting continuous data is a box-and- whisker or box plot. This is described in detail in Figure 4.10. As with dot plots, box plots can be particularly useful for comparing the distribution of the data across several groups. The box contains the middle 50% of the data, with lowest 25% of the data lying below it and the highest 25% of the data lying above it. In fact the upper and lower edges represent a particular quantity called the inter- quartile range. The horizontal line in the middle of the box represents the median value as described in Section 4.4. The whiskers extend to the largest and smallest values excluding the outlying values. The outlying values are deﬁned as those values more than 1.5 box lengths from the upper or lower edges, and are represented as the dots outside the whiskers. Figure 4.10 shows box plots of the heights of the men and women in the leg ulcer trial. Similar to dot plots, the gender differences in height are immediately obvious from this plot and this illustrates the main advantage of the box plot over histograms when looking at multiple groups. Differences in the
Displaying quantitative data 41 2.00 1.90 Median 1.80 Outlying values: observation more than 1.5 times box height away from the upper side of the box 1.70 Whiskers extend to last observations within 1.5 times the box height 1.60 1.50 Height of box: interquartile range. Lower limit is the 25th quartile, upper limit is the 75th quartile 1.40 Men (n 77) Women (n 145) Figure 4.10 Annotated box plots of height for the leg ulcer patients by sex, showing what each of the items displayed mean.3 distributions of data between groups are much easier to spot with box plots than with histograms. As a result of what they display (median, inter- quartile range, spread) they provide a good summary of the data and are more useful than dot plots for larger datasets, where a dot plot would look rather busy. Summary • Display univariate count data using bar charts as opposed to histograms unless the number of categories is large enough to be treated as approxi- mately continuous, in which case a histogram can be used. • Always display continuous data as dotplots if the sample size per group is low ( 100 subjects). • For univariate data a stem and leaf plot can be useful since all the data are available in the chart. • Use histograms to show the distribution of single variables. • To compare groups, for larger samples (say 50 subjects per group) use box–whisker plots.