1 Speech Classes {#estspeechclass}
8 EST offers two classes
for handling and storing speech
9 information of all types: Waveforms and Tracks. Both are
10 basically matrices with one dimension representing time, the
11 other representing a particular channel and the value at that
12 position representing an amplitude. There are signficant
13 differences between them, however, that makes the use of two
14 separate classes preferable to one.
16 ## Waveforms {#waveforms}
18 Waveforms store digitalled sampled acoustic waveforms. They are
19 composed of a matrix of shorts, where rows represent individual
20 samples and columns represent channels. Waves can have arbitrarily
21 many channels, though 1 (mono) and 2 (stereo) are the most common.
22 Waves are stored as shorts as this is the most common file format,
23 which ensures fast compatibility with most file formats and hardware.
24 As each sample is representing by a 16-bit
short, the dynamic range of
27 ## The Track Class {#trackclass}
29 The track
class is used to represent the outcome of a signal
30 processing operation on a section of speech. It can be thought of as
31 representing a series of *frames*, where each frame
32 represents signal processing information at a specified time point.
35 ### The Amplitude Matrix
37 Each frame is a set of ordered coefficients, which represent the
38 output of a signal processing operation on a single section of
39 speech. For example, a frame may represent a spectrum, a cepstrum, or
40 a set of linear predication coefficients. An alternative view is to
41 visualise the track as a set of channels, where each channel
42 represents a how a particular type of information varies with
43 time. For instance, a channel might represent how the
energy between
44 500Hz and 600Hz varies over the course of an utterance.
46 Frames and Channels are stored as a matrix of floats, where each point
47 in the matrix represents the amplitude of a given frame and a given
52 In addition to the amplitude matrix, tracks also contain a
53 *time* array, which has the same number of elements
54 as frames in the amplitude matrix. The time array is aligned
55 one-to-one with the frames. Each position in the time array represents
56 the time of its frame. In many forms of signal processing, frames are
57 at fixed intervals (often 10ms), and in such cases it would be
58 possible to store
this as a single global value. However, the track
59 class is extremely general in terms of time positions and allows
60 frames to be spaced irregularly, which is particularly useful when
61 dealing with pitch-synchronous processing.
63 ### The Break/Value Array
65 The track class also contains a *
break/value* array,
66 each element of which also as a one-to-one correspondence with a
67 frame. In many representations some frames have undefined values, and
68 the
break/value array is used to represent this. For example, F0
69 contours and formants
do not have values during unvoiced sections of
70 speech, and hence frames representing unvoiced sections may be tagged
71 as breaks in the
break/value array. By
default, it is assumed that all
72 amplitudes are defined and hence no breaks are set at contour
73 initialisation or resizing.
75 In time,
this will be replaced by the more general Auxiliary Matrix.
81 ### The Auxiliary Matrix
83 It is inappropriate to store certain information in the amplitude array.
85 ### Sub-tracks, channel and frame extaction
87 The track
class provides an easy mechanism for dealing with a single
88 portion of the track at a time. If we have say a track with 10
89 channels and 500 frames, it is possible to assign a vector to any
90 single frame or channel, or to assign a sub-track to any contiguous
91 set of frames or channels. Any values that are changed in the frame or
92 channel vectors or sub-track, will affect the underlying tack. It is
93 of course possible to copy values in and out if values need to be
94 changed without changing the underlying track.
97 # Programs {#estspeechclassprograms}
99 The following programs are available:
101 - @ref ch_wave_manual : performs basic operates on
102 waveforms, such as adding headers, resampling, rescaling, multi to
103 single channel conversion etc.
104 - @ref ch_track_manual : performs basic operates on
105 coefficient tracks, such as adding headers, resampling, rescaling,
106 multi to single channel conversion etc.
108 # Classes {#estspeechclassclasses}
113 # Functions {#estspecchclassfunctions}
115 ## Auxiliary Track Functions {#estspeechclassfuncaux}
117 - @ref EST_Track_aux_functions
A class for storing digital waveforms. The waveform is stored as an array of 16 bit shorts...
void energy(EST_Wave &sig, EST_Track &a, float factor)