Cough is the commonest symptom forcing the patients to seek a medical advice (1, 2). To date, the methods used to assess the cough have been primarily subjective and only broadly reflect the impact of cough and/or cough therapies on the quality of life (3). Because the cough is episodic in its nature, data collection over many hours is required for its objective monitoring requiring subsequent real-time aural analysis, which is equally time-consuming. Ambulatory cough monitoring systems have been proposed recently based either on sound recordings alone (4) or on simultaneous sound and electromyography recordings (5-6), but their use has remained restricted to the research setting mainly due to the need of a trained operator to manually identify cough events from the recordings, which is an arduous task.
In order to make cough monitoring applicable to clinical practice, it is necessary to develop accurate automatic cough monitoring system for recording, detection and counting of coughs. With the availability of digital recording devices and the advances in digital storage media, battery powered mp3 player/recorders can be used to make high quality ambulatory sound recordings. Data can be transferred to personal computer and the recordings can be used to develop algorithms for cough sounds identification (7). Several systems for automatic cough recognition and monitoring based on sound recording have been described more recently (7-10). They are based on automatic cough detection algorithms that operate relatively reliably in the ambulatory setting. Different methods of analysis were applied to resolve the problem of recognition of cough sounds present in a given recording, while rejecting other sounds with similar characteristics.
Because these mathematical algorithms are not commercially available, we set out to prepare own mathematical algorithm for the analysis of cough frequency, based on spectral analysis and nonlinear method of sample entropy quantifying irregularity of a sound signal. A first step toward automated recognition of cough within a continuous sound recording is to distinguish blindly between cough sound and speech. We have some experience with the cough sound analysis (11, 12). The aim of the present study was to develop an algorithm based on the classification tree for the automatic recognition of cough sound and speech, as the first step in the development of a cough monitoring system.
MATERIAL AND METHODS
The study was approved by a local Ethics Committee and was performed in accordance with the Helsinki Declaration of 1975 for Human Research. We made a comparison between speech and voluntary cough sounds recorded from healthy subjects. Firstly, the signal was screened for periods of sound exceeding noise level within the recordings. These sound events were extracted, and the periods of silence were omitted from further analysis. The extracted sound events were stored into separate files and underwent digital signal processing to calculate the sound characteristics. The sound events were then classified into cough and non-cough events using classification tree (13).
Subjects
The study group included 20 healthy subjects (15 female - median age 34.3 yr, range 18 - 56 yr; 5 male - median age 47.2 yr, range 26 - 66 yr). All subjects were without any respiratory disease according to personal history and basic examination. Two subjects were excluded from our study because they were not able to perform voluntary cough appropriately.
Recording system
The recording system consisted of a portable digital voice recorder (Sony ICD-MX20, Sony Corporation, China) with the sampling frequency of 8 kHz and a miniature omnidirectional condenser microphone (ATR35s, Audio-Technica U.S., Philippines) with a flat frequency response between 50-18 000 Hz. The microphone was attached to the subject's chest and was covered by plastic foam membrane to suppress sounds coming from the outer environment. The audio signal from the microphone was initially recorded to the memory card of the digital recorder as a MSV file (memory stick voice file). After recording, we transferred this file into the PC and converted it to 11 kHz 16-bit mono digital wave file (WAV format) using Digital Voice Editor 2.31 software (Sony Corporation).
Protocol
All subjects performed continuous reading of a text from a book with voluntary coughs (46 cough events) performed at the instants indicated in the text. The recording lasted for about 20 min. Before reading the text, the subjects coughed voluntarily three times to obtain their individual cough sound pattern.
Determination of sound events
The first step in the sound analysis was the isolation of sound events from the raw recording. We used the moving window without an overlap over the whole audio signal to calculate the standard deviation (SD) of the signal for each position of a window. The length of the moving window was 200 points corresponding to duration of 18.2 ms. SD for each window position was compared with an empirically determined threshold value exceeding the background noise level. Portions of the signals containing no sound events showed only small SD related to the inherent noise present in the signal. Portions of the signal that were below the threshold value were excluded from further analysis. The sound event exceeding a given threshold was then subjected to a more detailed detection of its beginning and end (7). For this purpose, we found the time instants when the corresponding SD of the signal exceeded 40% of the previously used threshold at the beginning and end of the sound event. Since the cough sound can be composed of two relatively isolated sounds (double cough sounds), we searched for the sound immediately following a given sound event. If this sound was present both sounds were connected into one sound event. Detected sound events were stored in separate WAV files.
Sound events analysis
A second step was to calculate the characteristics of sound events. We quantified
the duration of each sound file (parameter
length).
Spectral analysis. Because of apparent non-stationarity of the sound
signal, each sound event was analyzed using a short-time Fourier transform (STFT).
We used the Hanning window (512 samples) to avoid spectral leakage and window
was shifted with a step of 5 samples. The length of the used window corresponds
to 45 ms of time. For each window position, power spectral density (PSD) (smoothed
by moving averaging) was computed. Total power (TP) corresponding to the area
under the PSD curve was computed as a measure of the sound intensity for a given
shift of the window. The maximum TP value throughout the sound event (global
maximum) was denoted as
TPmax - it is a measure of a maximum sound intensity.
TPmean was a measure of the mean sound intensity and was computed as
an arithmetic mean of the TP values for the whole sound event.
Next, we found all the local maxima and minima in the time course of TP in a
sound event. We computed the measure ratio as a
ratio of the sum of TPs
of all local maxima divided by the sum of TPs of all local minima in given sound
events.
From the spectrum corresponding to the global maximum of the sound, we also
computed the
skewness (
skewnessglobal)
and
kurtosis (
kurtosisglobal)
of the PSD value distribution (in the frequency band 0-1000 Hz) to distinguish
between harmonic sounds (high peaks in the spectrum) and noise-like sounds (flat
spectra). The time of the occurrence of the global maximum was marked as the
parameter
timeglobal.
In the next step, we found the first local maximum of TP and its value was divided
by the time when it occurred (from the start of the sound event) - the parameter
slope. This measure quantifies the sound intensity increase at the start
of a sound event. The time of the occurrence of the first local maximum was
marked as the parameter time
local. Analogously
to the global maximum spectrum, skewness (
skewnesslocal)
and kurtosis (
kurtosislocal) and the
time of its occurrence (
timelocal) were
also computed for the spectrum corresponding to the first local maximum.
Nonlinear analysis. For 512 samples corresponding to the local and global
maxima, we computed sample entropy (SampEn) values. SampEn is a measure of irregularity
and unpredictability of the signal. Therefore, it is higher for noisy signals
compared to periodic oscillations. SampEn (
m, r, N) is a negative natural
logarithm of the conditional probability that two sequences similar for
m
points remain similar at the next point. Algorithm for SampEn computation was
published elsewhere (13).
We presumed that cough sound would have a higher degree of irregularity compared
with the speech. The SampEn was calculated for two values of the input parameter
r (tolerance;
r = 0.1 and
r = 0.2 times SD of the window). The
length of compared sequence (
m = 2) and the length of analyzed window
(N = 512 samples) was fixed. The SampEn for a local maximum was denoted as SampEn
local
(0.1) and SampEn
local (0.2). SampEn values corresponding
to the global maximum were denoted as SampEn
global
(0.1) and SampEn
global (0.2).
Classification tree
The sound events were then classified into "cough" and "non-cough" sounds using
a classification tree. The input parameters for tree construction were all the
assessed variables (length,
TPmax,
TPmean,
ratio,
slope,
skewnesslocal,
kurtosislocal,
skewnessglobal,
kurtosisglobal,
timelocal,
timeglobal,
SampEnlocal (0.1), SampEnlocal
(0.2), SampEnglobal (0.1), SampEnglobal
(0.2)) and the output parameter of the tree was the presence of the cough
or non-cough events.
Statistics
Differences in the sound parameters between cough and non-cough sounds were evaluated using a nonparametric Mann-Whitney U-test. For the classification of sound events, the classification tree was constructed. Values are presented as medians and interquartile ranges due to non-gaussian distribution of the variables. Statistical analysis was performed using statistical package Systat 10, SPSS Inc.
RESULTS
All variables were significantly different when comparing cough sound and speech
(P<0.001 for all). The variables significantly higher in cough sounds (compared
to speech) include:
length,
TPmean,
TPmax,
slope,
SampEnlocal (0.1),
SampEnlocal
(0.2),
SampEnglobal (0.1), and
SampEnglobal (0.2), while the following
variables were significantly lower:
timelocal,
skewnesslocal,
kurtosislocal,
ratio,
timeglobal,
skewnessglobal,
and
kurtosisglobal.
We made 18 recordings from which we obtained 6590 sound files, consisting of
cough and non-cough sounds. The sound files included 892 cough sounds and 5698
non-cough sounds. The algorithm of the tree construction selected 6 sound characteristics
(
length,
TPmean,
slope,
ratio,
SampEnlocal
(0.1), and
SampEnlocal (0.2)) as the
variables that were most useful for the classification (
Fig. 1). The
selected variables provide relatively independent information contained in the
sound files, which is useful for the classification of sound events. The results
of classification tree performance for individual subjects are summarized in
Table 1. We compared the effectiveness of our designed algorithm with
the manually classified, as cough and non-cough events, sound files. The system
performance was evaluated calculating the sensitivity, specificity, and a number
of true positives, true negatives, false positives, and false negatives. The
median sensitivity was 100% (interquartile range was 98-100) and the median
specificity was 95% (interquartile range was 90-97).
| Table 1.
Results of sound classification in 18 subjects and comparison between
performance of the developed algorithm vs. manual counting of coughs are
shown. The recordings were classified by trained listeners and cough and
non-cough sounds were marked. This classification was regarded as a gold
standard for the evaluation of the classification tree effectiveness (5).
Algorithm performance was evaluated by calculating the sensitivity, specificity,
and a number of true positives, true negatives, false positives, and false
negatives. |
 |
The performance of the algorithm is influenced by the energy level of the cough
signal. Lower values of sensitivity and specificity were obtained in subjects
who had lower intensity of cough and speech. In the individual recordings, a
total number of coughs differed, because in some subjects voluntary cough evoked
spontaneous cough and several volunteers forgot to cough at indicated instants.
 |
Fig. 1. Classification tree used for the distinction of cough
sounds from speech. The algorithm chose the six most useful variables
for correct classification of sound events. The algorithm performs a
stepwise splitting. The top panel contains the entire sample, i.e.,
all sounds. For illustration, cough sounds, classified manually, form
the column on the left-hand side of each panel, while non-cough sounds
(speech) are displayed on the right-hand side. Each downward split panel
contains a subset of the sample in the panel directly above it. Furthermore,
each panel contains the sum of the samples in the corresponding panels
below it. Each panel can be thought of as a cluster of objects, or cases,
which are split down in the tree. The tree is binary, because each panel
is split only into two subsamples. The variables automatically selected
for the construction of a classification tree included:
- length: the entire length of the analyzed sound event;
- ratio: the ratio of the sum of total powers of all local
maxima divided by the sum of total powers of all local minima;
- slope: the value of the first local maximum divided by the
time of its occurrence;
- TPmean: the arithmetic mean of the total power for the whole
sound event ;
- SampEnlocal (0.1): sample
entropy for 512 samples corresponding to the first local maximum for
r=0.1;
- SampEnlocal (0.2): sample
entropy for 512 samples corresponding to the first local maximum for
r=0.2
|
DISCUSSION
In the present pilot study we presented an algorithm that is able to distinguish voluntary cough sound from speech. We consider that speech is the most common sound present in the 24-hour recordings obtained from subjects who perform their daily activities. Therefore, we assume that the distinction of the speech from cough sounds is the first step toward developing an ambulatory cough monitoring system.
The median sensitivity of our algorithm was 100% and specificity was 95%. The
algorithm is based on time domain, spectral, and nonlinear analysis of the sound
files. The sounds were classified as cough and non-cough using a classification
tree. Generally, a high degree of accuracy with sensitivity over 95% was reached.
Relatively small sensitivity in 3 patients (
Table 1) can be attributed
to a lower effort during voluntary cough in these subjects.
Recently described systems allowing the distinction between cough and non-cough sounds calculate time-varying spectral features of analyzed sounds and use hidden Markov models (10) or probalistic neural network (9). The values of sensitivity and specificity in those studies are lower compared with our present results, but the comparison is hindered by differences in the measurement protocols. In the present study, we compared cough sounds vs. speech, but other authors compared cough sounds vs. sounds occurring during normal daily activities. Our protocol was standardized - the subjects performed continuous reading of the text from a book and they coughed voluntarily at the indicated instants.
One of the limitations of this study was a relatively small number of patients.
Our protocol was aimed at distinguishing cough sounds from speech, but other
sounds occur during daily activities, too. In the future, we want to investigate
the performance of our algorithm for distinction between cough sounds and other
sounds, including sneezing and sounds from outer environment. In addition, we
compared voluntary cough sounds
vs. speech. The characteristics of spontaneous
cough sounds could be different compared with voluntary cough. Therefore, it
is necessary to validate the algorithm in spontaneously coughing subjects.
We conclude that the algorithm developed for the distinction of speech and cough sounds reached a high degree of accuracy, indicating its potential usefulness in clinical practice. In future studies, we plan to improve the developed algorithm to enable the analysis of 24-hour monitoring of cough frequency in subjects performing normal daily activities.
Acknowledgements:
Our thanks are due to T. Zatko and M. Vrabec for their outstanding technical
assistance. This study was supported by European Social Fund Project SOP LZ
2005/NP1-027.
Conflicts of interest: The authors had no conflicts of interest to declare
in relation to this article.
REFERENCES
- Varechova S, Durdik P, Cervenkova V, Ciljakova M, Banovcin P, Hanacek J. The influence of autonomic neuropathy on cough reflex sensitivity in children with diabetes mellitus type 1. J Physiol Pharmacol 2007; 58 Suppl 5: 705-716.
- Varechova S, Mikler J, Murgas D, Dragula M, Banovcin P, Hanacek J. Cough reflex sensitivity in children with suspected and confirmed gastroesophageal reflux disease. J Physiol Pharmacol 2007; 58 Suppl 5: 717-728.
- French CT, Irwin RS, Fletcher KE, Adams TM. Evaluation of a cough-specific quality-of-life questionnaire. Chest 2002; 121: 1123-1131.
- Subburaj S, Parvez L, Rajagopalan TG. Methods of recording and analysing cough sounds. Pulm Pharmacol 1996; 9: 269-279.
- Munyard P, Busst C, Logan-Sinclair R, Bush A. A new device for ambulatory cough recording. Pediatr Pulmonol 1994; 18: 178-86.
- Chang AB, Newman RG, Phelan PD, Robertson CF. A new use for an old Holter monitor: An ambulatory cough meter. Eur Respir J 1997; 10: 1637-1639.
- Smith JA, Earis JE, Woodcock AA. Establishing a gold standard for manual cough counting: video versus digital audio recordings. Cough 2006; 2: 6, doi:10.1186/1745-9974-2-6, http://www.coughjournal.com/content/2/1/6
- Coyle MA, Keenan DB, Henderson LS et al. Evaluation of an ambulatory system
for the quantification of cough frequency in patients with chronic obstructive
pulmonary disease. Cough 2005; 1: 3, doi:10.1186/1745-9974-1-3, http://www.coughjournal.com/content/1/1/3.
- Barry SJ, Dane AD, Morice AH, Walmsley AD. The automatic recognition and counting
of cough. Cough 2006; 2: 8, doi:10.1186/1745-9974-2-8, http://www.coughjournal.com/content/2/1/8.
- Matos S, Birring SS, Pavord ID, Evans DH. An automated system for 24-h monitoring of cough frequency: the Leicester cough monitor. IEEE Trans Biomed Eng 2007; 54: 1472-1479.
- Korpas J, Sadlonova J, Vrabec M. Analysis of the Cough Sound: an Overview. Pulm Pharmacol 1996; 9: 261-268.
- Korpas J, Vrabec M, Sadlonova J, Javorka M, Javorkova N. Single, double and multi cough sound differentiation. Acta Physiol Hung 2005; 92: 203-209.
- Richman JS, Moorman JR. Physiological time-series analysis using approximate entropy and sample entropy. Am J Physiol 2000; 278: H2039-H2049.