Skip to content

Made changes to the augmentation script to make it work for ASR and speaker ID #3119

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

phanisankar-nidadavolu
Copy link
Contributor

No description provided.

@danpovey
Copy link
Contributor

@david-ryan-snyder please LMK if this is good to merge

@phanisankar-nidadavolu phanisankar-nidadavolu changed the title Made changes to the augmentation script make it work for ASR and speaker ID Made changes to the augmentation script to make it work for ASR and speaker ID Mar 15, 2019
Copy link
Contributor

@danpovey danpovey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some smallish comments. Big question is what to do with ivectors and CMN and whether to make those changes at the same time as this.


mkdir -p local/musan.tmp

echo "Preparing ${data_dir}/musan..."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd like this script to exit with an error if it detects that what it is trying to create already exists.

# Copyright 2015 David Snyder
# Apache 2.0.
#
# This file is meant to be invoked by make_musan.sh.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are re-using this script I think it would be better to work on it a bit more carefully-- e.g. use argparse to parse the command line args (with something more standard than the 'Y' to use the vocals, i.e. make it a boolean flag)-- and have a proper usage message printed out if called wrongly.
Also, I think we can put this in a shared place, steps/augmentation/ or wherever, since other scripts will likely want to call it. The same goes for the shell script-- let's put it somewhere shared, e.g. in the same place; have it check whether its input already exists; give it a usage message; and have it parse command line args properly (e.g. give it a --use-vocals option). Also some kind of function-level documentation for this python script would be nice (e.g. a doc-string).

@@ -0,0 +1,242 @@
#!/bin/bash

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put the normal apache header on this please, and your name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also can you please add an arg use_ivectors=true that wil give the option to skip all the i-vector-related parts of the script? We may want to use this option later on when we switch to online CMN, but for now I guess we can leave that issue separate, I suppose there is no need to mix it up with this.

fi

if [ "$multi_style" == "true" ]; then
if [ $stage -le 1 ]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can break this up into more stages, in case it dies somewhere in the middle. And please have more of the stages check whether their work has already been done and die with a suitable message if so.


if [ -e data/rt03 ]; then maybe_rt03=rt03; else maybe_rt03= ; fi

if [ "$multi_style" == "true" ]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove the multi-style option here, for purposes of experiments, since if people didn't want it, they just shouldn't call this. I know this was useful for you... you can of course use it for your own experiments, but i would have though just settinng noise_list to "clean" would do what you want.
Also I don't like the name "noise_list" as reverb and clean are not noise. Let's call it "augmentation_list" wherever it occurs.

else:
raise Exception("Trying to add both prefix and suffix. Choose either of them")

args.modify_spkr_id = (True if args.modify_spkr_id == "true" else False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example of how to register boolean options:

    parser.add_argument("--chain.apply-deriv-weights", type=str,
                        dest='apply_deriv_weights', default=True,
                        action=common_lib.StrToBoolAction,
                        choices=["true", "false"],
                        help="")

parser.add_argument('--random-seed', type=int, dest = "random_seed", default = 123, help='Random seed.')

parser.add_argument('--modify-spkr-id', type=str, default = "false", choices=["true", "false"], help='Utt prefix or suffix would be added to the speaker id also (Used in ASR), in speaker id it is left unmodifed' )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's call this --modify-spk-id .. more consistent with spk2utt, etc.

CopyFileIfExists(input_dir + "/reco2file_and_channel", output_dir + "/reco2file_and_channel", args.utt_modifier_type, args.utt_modifier, fields=[0, 1])

if args.modify_spkr_id:
CopyFileIfExists(input_dir + "/spk2gender", output_dir + "/spk2gender", args.utt_modifier_type, args.utt_modifier)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please try to mostly keep within 80 or 100 characters by breaking long lines.

def CopyFileIfExists(utt_suffix, filename, input_dir, output_dir):
if os.path.isfile(input_dir + "/" + filename):
dict = ParseFileToDict(input_dir + "/" + filename,
# This function generates a new id from the input id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good that you are adding documentation, but let's do it with a doc-string, e.g.

def Foo(string):
     """ Foo is a function that does nothing.
               'string' is expected to be of type str, but is otherwise ignored.
          This function returns None.
     """
      return None

if utt in utt2wav:
if use_vocals or not utt2vocals[utt]:
utt2spk_str = utt2spk_str + utt + " " + utt2spk[utt] + "\n"
utt2wav_str = utt2wav_str + utt + " sox -t wav " + utt2wav[utt] + " -r 8k -t wav - |\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this script can be used for 16000 Hz sampled audio too, how about 8k as a parameter

@danpovey
Copy link
Contributor

@david-ryan-snyder or @phanisankar-nidadavolu, perhaps you can comment? And @phanisankar-nidadavolu, let me know if I should be merging this now.

@phanisankar-nidadavolu
Copy link
Contributor Author

phanisankar-nidadavolu commented Apr 18, 2019 via email

@danpovey
Copy link
Contributor

danpovey commented Apr 18, 2019 via email

@phanisankar-nidadavolu
Copy link
Contributor Author

phanisankar-nidadavolu commented Apr 18, 2019 via email

@david-ryan-snyder
Copy link
Contributor

The make_musan.py script is just a data prep scrip in local. Since SWBD uses audio sampled at 8kHz we know what the sample rate should be for this script. I don't see why this needs to be an option.

@danpovey
Copy link
Contributor

danpovey commented Apr 18, 2019 via email

@david-ryan-snyder
Copy link
Contributor

If we consider MUSAN important enough to put into steps/ it makes sense to change the usage so that the sample rate is an option. If you go that route, you should be sure to change all of the recipes that use MUSAN (that will include all of the x-vector recipes) so that it uses the new script in steps/ and also be sure to remove the old script in local.

@danpovey
Copy link
Contributor

danpovey commented Apr 18, 2019 via email

@david-ryan-snyder
Copy link
Contributor

Moving it to steps/augmentation or steps/data sounds like a good idea to me.

@danpovey
Copy link
Contributor

danpovey commented Apr 18, 2019 via email

@danpovey
Copy link
Contributor

danpovey commented Apr 18, 2019 via email

@phanisankar-nidadavolu
Copy link
Contributor Author

phanisankar-nidadavolu commented Apr 18, 2019 via email

…r on augmented data, moved musan scripts to steps/data and other script level changes that Dan suggested in the PR review earlier
@david-ryan-snyder
Copy link
Contributor

@phanisankar-nidadavolu, is this PR done and ready to be reviewed?

@phanisankar-nidadavolu
Copy link
Contributor Author

phanisankar-nidadavolu commented Apr 22, 2019 via email

data/${train_set} data/${train_set}_babble

# Combine all the augmentation dirs
# This part can be simplified once we know what noise types we will add
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment still relevant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is still relevant for now. We have not arrived at a general conclusion yet.


if [ $# -ne 2 ]; then
echo USAGE: $0 input_dir output_dir
echo input_dir is the path where the original musal corpus is located
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"original musal" -> "MUSAN"

# This script creates the MUSAN data directory.
# Consists of babble, music and noise files.
# Used to create augmented data
# The required dataset is freely available at http://www.openslr.org/17/
Copy link
Contributor

@david-ryan-snyder david-ryan-snyder Apr 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment that says something like:

# The corpus can be cited as follows:
# @misc{musan2015,
#  author = {David Snyder and Guoguo Chen and Daniel Povey},
#  title = {{MUSAN}: {A} {M}usic, {S}peech, and {N}oise {C}orpus},
#  year = {2015},
#  eprint = {1510.08484},
#  note = {arXiv:1510.08484v1}
# }

@@ -0,0 +1,256 @@
#!/bin/bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this script should go in into a new tuning directory, e.g., local/chain/multi_style/tuning/run_tdnn_1a.sh.

Then, in local/chain/multi_style, I suggest creating a symbolic link to run_tdnn_1a.sh called run_tdnn.sh. That would be analogous to what is done in other recipes, such as https://github.com./kaldi-asr/kaldi/tree/master/egs/swbd/s5c/local/chain .

Also, I suggest adding a more detailed comment at the top of this script that also includes some WER results. For example, see https://github.com./kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/chain/tuning/run_tdnn_lstm_1n.sh.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense but instead of creating the sym link in local/chain/multi_style it is better to create in local/chain/run_tdnn_multistyle.sh, since all the other tend scripts are located there. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the changes except for the inclusion of WERs. I will include them once the tdnn finishes training. In my old experiments I never decoded on train_dev.

@danpovey
Copy link
Contributor

danpovey commented Apr 23, 2019 via email

@danpovey
Copy link
Contributor

danpovey commented Apr 23, 2019 via email

@david-ryan-snyder
Copy link
Contributor

I have two more comments on the aesthetics of the PR while you're waiting for the WERs.

I don't think we need the multi_style directory. This doesn't matter much for this particular PR, but as we start adding more scripts that use augmentation, deciding where these scripts will go will become more important. As I understand it, we plan on augmentation becoming part of the standard ASR recipe. I think it makes more sense to put theses scripts in the same place people are used to finding them, which is in chain/tuning.
My suggestion is to move the new script to local/chain/tuning/run_tdnn_multistyle_1a.sh and make local/chain/run_tdnn_multistyle.sh be a symbolic link to that.

Also, in my opinion, it's better to refer to these recipes using the word "augmentation" rather than "multi style." In my opinion, using some form of the word "augment" makes it immediately clear what you're doing, and then the terminology would be consistent with the new directory steps/data/augmentation that you're adding. Also, "multi style" sounds old fashioned to me. What do you think?

@david-ryan-snyder
Copy link
Contributor

david-ryan-snyder commented Apr 25, 2019

During the meeting today, @danpovey suggested that we use the word "aug" for the scripts in this recipe instead of "multi style" or "augmentation." Seems nice and concise, and everyone should know what that means at first glance, I think.

@vimalmanohar
Copy link
Contributor

I think it is better to call it multi_condition or something to be consistent with existing recipes (in Aspire, AMI and Babel and probably many others) that already use that name for the same purpose. The scripts should go to local/multi_condition or local/chain/multi_condition/.

@david-ryan-snyder
Copy link
Contributor

@vimalmanohar, my objection to this is because augmentation is now going to become a standard part of all or most of our ASR recipes. I think it's going to be cumbersome to propagate this terminology into the names of scripts for future recipes. If we need a way to distinguish between the past recipes without augmentation, and new recipes that use augmentation, adding the suffix "aug" to the new scripts is a concise (and I think, more intuitive) way to indicate what the change is.

If we have to go with multi_*, I agree multi_condition is better than multi_style since it already exists.

#!/bin/bash

noise_list="reverb1:babble:music:noise"
max_jobs_run=50
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to create a generic version of the script in steps like steps/copy_ali_dir.sh with option as --prefixes. It would be painful to keep copying this script to every new recipe.

. utils/parse_options.sh

if [ $# -ne 3 ]; then
echo "Usage: $0 <out-data> <src-ali-dir> <out-ali-dir>"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sem thing to this script. Also change ali-dir to lat-dir and give a description of what this script does.
seems like an inappropriate name as it is actually an input.

cat $dir/lat_tmp.*.scp | awk -v p=$p '{print p$0}'
done | sort -k1,1 > $dir/lat_out.scp.noise

cat $dir/lat_tmp.*.scp | awk '{print $0}' | sort -k1,1 > $dir/lat_out.scp.clean
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If clean data is also needed to be added, then add another option for this such as --include-original. This type of option is already used in one of the older scripts.

. ./cmd.sh

set -e
stage=0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script should be also be called local/nnet3/multi_condition/run_ivector_common.sh. It is same as the one that is in aspire, AMI and Babel but uses a new script for augmentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have made use_ivectors an optional argument. I am not sure run_ivector_common.sh is a valid name in that case. We either have to remove use_ivectors flag and make training of ivectors a default option or name the script prepare_aug_data.sh. Let me know what you think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, maybe prepare_aug_data.sh is OK then. We may not end up using ivectors in all cases. I don't have super strong opinoins about that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe run_aug_common.sh?

from reverberate_data_dir import ParseFileToDict
from reverberate_data_dir import WriteDictToFile
import libs.common as common_lib
data_lib = imp.load_source('dml', 'steps/data/data_dir_manipulation_lib.py')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if this script is being created new, it would be better to stick to the PEP8 standards, which includes using function names like get_args() instead of GetArgs().
But more importantly I think some of the things like imp.load_source may not work in python3. Or at least, it is very old style and should be modified to use import data_dir_manipulation_lib as data_lib.

sys.exit()

fg_snrs = [int(i) for i in args.fg_snr_str.split(":")]
bg_snrs = [int(i) for i in args.bg_snr_str.split(":")]
num_bg_noises = [int(i) for i in args.num_bg_noises.split(":")]
reco2dur = ParseFileToDict(input_dir + "/reco2dur",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think instead of assuming this exists, it is better to first call utils/data/get_reco2dur.sh. Otherwise this is going to be called once for each augmentation that will be done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Vimal, the wrapper script prepare_multistyle_data.sh (now changed to run_ivector_aug.sh) first creates reco2dur file in the input directory and then we call this script to augment. I guess it is safe to assume that this file exists (at least for this recipe). Do you want me to modify the script to first check whether the file exists or not, and create the file if it does not exist?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add it do this script. utils/data/get_reco2dur.sh already will check if file exists and create it only if not. So you can simply call that script.

@@ -84,9 +84,6 @@ def GetArgs():
return args

def CheckArgs(args):
if not os.path.exists(args.output_dir):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is removed, then the script will fail if the output_dir exists.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vimalmanohar I don't see why. This part of the code is only creating output dir if it does not exist. Anyways, I am modifying the code to fail if the output directory already exists.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think Vimal was asking for it to fail if the output dir exists (and I don't recommend it either).
I think he's pointing out that since, below, you do os.makedirs with no such guard, it will fail if it did not
exist. It seems to have an exist_ok parameter which you can set to true.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original version of the script will re-create the augmented data dir if the script is run again. But Phani modified it so it would do nothing if run again, which I think is unexpected behavior. It should either 1) re-create the augmented data dir, 2) fail with error saying it already exists.

@@ -653,6 +658,12 @@ def Main():
pointsource_noise_addition_probability = args.pointsource_noise_addition_probability,
max_noises_per_minute = args.max_noises_per_minute)

else:
print("Directory {0} already exists, not creating it again".format(args.output_dir))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it should skip creating it if its already there. I think it should either recreate it or fail with an error saying it already exists.

@danpovey
Copy link
Contributor

danpovey commented Apr 26, 2019 via email

@vimalmanohar
Copy link
Contributor

I guess it makes sense then to use augmentation if all the new recipes have it. But perhaps it is better to have the new recipes in local/chain/augmentation/tuning and then create a symlink in local/chain or local/chain/augmentation.

@danpovey
Copy link
Contributor

danpovey commented Apr 26, 2019 via email

@phanisankar-nidadavolu
Copy link
Contributor Author

phanisankar-nidadavolu commented May 1, 2019 via email

@phanisankar-nidadavolu
Copy link
Contributor Author

phanisankar-nidadavolu commented May 1, 2019 via email

@phanisankar-nidadavolu phanisankar-nidadavolu force-pushed the augmentation-script-asr-spkrid branch from 3418b62 to 986aebe Compare May 6, 2019 13:41
@phanisankar-nidadavolu
Copy link
Contributor Author

Hello, the PR looks good with changes that you suggested. Main changes made: renaming to aug, moving around files, adding results and some other stuff.

@danpovey danpovey merged commit a861e56 into kaldi-asr:master May 13, 2019
danpovey pushed a commit that referenced this pull request May 16, 2019
danpovey pushed a commit to danpovey/kaldi that referenced this pull request Jun 19, 2019
… for ASR and speaker ID (kaldi-asr#3119)

Now multi-style training with noise and reverberation is an option (instead of speed augmentation).
Multi-style training seems to be more robust to unseen/noisy conditions.
danpovey pushed a commit to danpovey/kaldi that referenced this pull request Jun 19, 2019
danpovey pushed a commit to danpovey/kaldi that referenced this pull request Jun 19, 2019
danpovey pushed a commit to danpovey/kaldi that referenced this pull request Dec 17, 2019
* [egs] New chime-5 recipe (kaldi-asr#2893)

* [scripts,egs] Made changes to the augmentation script to make it work for ASR and speaker ID (kaldi-asr#3119)

Now multi-style training with noise and reverberation is an option (instead of speed augmentation).
Multi-style training seems to be more robust to unseen/noisy conditions.

* [egs] updated local/musan.sh to steps/data/make_musan.sh in speaker id scripts (kaldi-asr#3320)

* [src] Fix sample rounding errors in extract-segments (kaldi-asr#3321)

With a segments file constructed from exact wave file durations
some segments came out one sample short. The reason is the
multiplication of the float sample frequency and double audio
time point is inexact. For example, float 8000.0 multiplied
by double 2.03 yields 16239.99999999999, one LSB short of the
correct sample number 16240.

Also changed all endpoint calculations so that they performed
in seconds, not sample numbers, as this does not require a
conversion in nearly every comparison, and report positions in
diagnostic messages also in seconds, not sample numbers.

* [src,scripts]Store frame_shift, utt2{dur,num_frames}, .conf with features (kaldi-asr#3316)

Generate utt2dur and utt2num_frames during feature extraction,
and store frame period in frame_shift file in feature directory.

Copy relevant .conf files used in feature extraction into
the conf/ subdirectory with features.

Add missing validations and options in some extraction scripts.

* [build] Initial version of Docker images for (CPU and GPU versions) (kaldi-asr#3322)

* [scripts] fix typo/bug in make_musan.py (kaldi-asr#3327)

* [scripts] Fixed misnamed variable in data/make_musan.py (kaldi-asr#3324)

* [scripts] Trust frame_shift and utt2num_frames if found (kaldi-asr#3313)

Getting utt2dur involves accessing wave files, and potentially
running full pipelines in wav.scp, which may take hours for a
large data set. If utt2num_frames exists, use it instead if
frame rate is known.

Issue: kaldi-asr#3303
Fixes: kaldi-asr#3297 "cat: broken pipe"

* [scripts] typo fix in augmentation script (kaldi-asr#3329)

Fixes typo in kaldi-asr#3119

* [scripts] handle frame_shit and utt2num_frames in utils/ (kaldi-asr#3323)

subset_data_dir.sh has been refactored thoroughly so that its
logic can be followed easier. It has been well tested and
dogfooded.

All changes here are necessary to subset, combine and verify
utt2num_frames, and copy frame_shift to new directories where
necessary.

* [scripts] Extend combine_ali_dirs.sh to combine alignment lattices (kaldi-asr#3315)

Relevant discussion:
https://groups.google.com/forum/#!topic/kaldi-help/2uxfByEAmfw

* [src] Fix rare case when segment end rounding overshoots file end in extract-segments (kaldi-asr#3331)

* [scripts] Change --modify-spk-id default to False; back-compatibility fix for kaldi-asr#3119 (kaldi-asr#3334)

* [build] Add easier configure option in failure message of configure (kaldi-asr#3335)

* [scripts,minor] Fix typo in comment (kaldi-asr#3338)

* [src,egs] Add option for applying SVD on trained models (kaldi-asr#3272)

* [src] Add interfaces to nnet-batch-compute that expects device input. (kaldi-asr#3311)

This avoids a ping pong of memory to host.

Implementation now assumes device memory.  interfaces will allocate
device memory and copy to it if data starts on host.

Add a cuda matrix copy function which clamps rows.  This is much
faster than copying one row at a time and the kernel can handle the
clamping for free.

* [build] Update GCC support check for CUDA toolkit 10.1 (kaldi-asr#3345)

* [egs] Fix to aishell1 v1 download script (kaldi-asr#3344)

* [scripts] Support utf-8 files in some scripts (kaldi-asr#3346)

* [src] Fix potential underflow bug in MFCC, RE energy floor, thx: Zoltan Tobler (kaldi-asr#3347)

* [scripts]: add warning to nnet3/chain/train.py about ineffective options (kaldi-asr#3341)

* [scripts] Fix regarding UTF handling in cleanup script (kaldi-asr#3352)

* [scripts] Change encoding to utf-8 in data augmentation scripts (kaldi-asr#3360)

* [src] Add CUDA accelerated MFCC computation. (kaldi-asr#3348)

* Add CUDA accelerated MFCC computation.

Creates a new directory 'cudafeat' for placing cuda feature extraction
components as it is developed.  Added a directory 'cudafeatbin' for
placing binaries that are cuda accelerated that mirrior binaries
elsewhere.

This commit implements:
  feature-window-cuda.h/cu which implements a feature window on the device
    by copying it from a host feature window.
  feature-mfcc-cuda.h/cu which implements the cuda mfcc feature
    extractor.
  compute-mfcc-feats-cuda.cc which mirriors compute-mfcc-feats.cc

  There were also minor changes to other files.

* Only build cuda binaries if cuda is enabled

* [src] Optimizations for batch nnet3.  The issue fixed here is that (kaldi-asr#3351)

small cuda memory copies are inefficeint because each copy can
add multiple micro-seconds of latency.  The code as written
would copy a small matrices or vectors to and from the tasks one
after another.  To avoid this i've implemented a batched matrix
copy routine.  This takes arrays of matrix descriptions for the
input and output and batches the copies in a single kernel call.
This is used in both FormatInputs and FormatOutputs to reduce
launch latency overhead.

The kernel for the batched copy uses a trick to avoid a memory
copy of the host paramters.  The parameters are put into a struct
containing a static sized array.  These parameters are then marshalled
like normal cuda parameters.  This avoids additional launch latency
overhead.

There is still more work to do at the beginning and end of nnet3.
In particular we may want to batch the clamped memory copies and
the large number of D2D copies at the end.  I haven't fully tracked
those down and may return to them in the future.

* [scripts,minor] Remove outdated comment (kaldi-asr#3361)

* [egs] A kaldi recipe based on the corpus named "aidatatang_200zh". (kaldi-asr#3326)

* [src] nnet1: changing end-rule in 'nnet-train-multistream', (kaldi-asr#3358)

- end the training when there is no more data to refill one of the streams,
- this avoids overtraining to the 'last' utterance,

* [scripts] Fix how the empty (faulty?) segments are handled in data-cleanup code (kaldi-asr#3337)

* [src] Fix to bug in ivector extraction causing assert failure, thx: sray (kaldi-asr#3364)

* [src] Fix to bug in ivector extraction causing assert failure, thx: sray (kaldi-asr#3365)

* [scripts] add script to compute dev PPL on kaldi-rnnlm (kaldi-asr#3340)

* [scripts,egs] Small fixes to diarization scripts (kaldi-asr#3366)

* [egs] Modify split_scp.pl usage to match its updated code (kaldi-asr#3371)

* [src] Fix non-cuda `make depend` build by putting compile guards around header. (kaldi-asr#3374)

* [build] Docker docs update and minor changes to the Docker files  (kaldi-asr#3377)

* [egs] Scripts for MATERIAL ASR (kaldi-asr#2165)

* [src] Batch nnet3 optimizations.  Batch some of the copies in and copies out (kaldi-asr#3378)

* [build] Widen cuda guard in cudafeat makefile. (kaldi-asr#3379)

* [scripts] nnet1: updating the scripts to support 'online-cmvn', (kaldi-asr#3383)

* [build,src] Enhancements to the cudamatrix/cudavector classes. (kaldi-asr#3373)

* Added CuSolver to the matrix class.  This is only supported with
Cuda 9.1 or newer.  Calling CuSolver code without Cuda 9.1 or newer
will result in a runtime error.

This change required some changes to the build system which requires
versioning the configure script. This forces everyone to reconfigure.
Failure to reconfigure would result in linking and build errors on
some systems.

* [egs] Fix perl `use encoding` deprecation (kaldi-asr#3386)

* [scripts] Add max_active to align_fmllr_lats.sh to prevent rare crashes (kaldi-asr#3387)

* [src] Implemented CUDA acclerated online cmvn. (kaldi-asr#3370)

This patch is part of a larger effort to implement the entire online feature pipeline in CUDA so that wav data is transfered to the device and never copied back to the host.
This patch includes a new binary cudafeatbin/apply-cmvn-online.cc which for the most part matches online2bin/apply-cmvn-online.
This binary is primarily for correctness testing and debugging as it makes no effort to compute multiple features in parallel on the device.
The CUDA performance is dominiated by the cost of copying the feature to and from the device. While there is a small speedup I do not expect this binary to be used in production.
Instead users will use the upcomming online-pipeline which will take features directly from the mfcc computation on the device and pass results to the next part of the pipeline.

Summary of changes:

Makefile:
   Added online2 dependencies to cudafeat, cudafeatbin, cudadecoder, and cudadecoderbin.
cudafeat\:
   Makefile:  added online2 dependency, added new .cu/.h files
   feature-online-cmvn-cuda.cu/h:  implements online-cmvn in cuda.
cudafeatbin\:
   Makefile:  added new binary, added online2 dependency
   apply-cmvn-online-cuda.cc:  binary which mimics online2bin/apply-cmvn-online

Correctness testing:

The correctness was tested by generating set of 20000 features and then running the CPU binary and GPU binary and comparing results using featbin/compare-feats.

../online2bin/apply-cmvn-online /workspace/models/LibriSpeech/ivector_extractor/global_cmvn.stats "scp:mfcc.scp" "ark,scp:cmvn.ark,cmvn.scp"
./apply-cmvn-online-cuda /workspace/models/LibriSpeech/ivector_extractor/global_cmvn.stats "scp:mfcc.scp" "ark,scp:cmvn-cuda.ark,cmvn-cuda.scp"

../featbin/compare-feats ark:cmvn-cuda.ark ark:cmvn.ark
LOG (compare-feats[5.5.1301~3-17818]:main():compare-feats.cc:105) self-product of 1st features for each column dimension:  [ 5.52221e+09 9.1134e+09 5.92818e+09 7.42173e+09 7.48633e+09 7.21316e+09 6.9515e+09 7.03883e+09 6.40267e+09 5.83088e+09 5.01438e+09 5.1575e+09 4.28688e+09 3.529e+09 3.12182e+09 2.28721e+09 1.76343e+09 1.35117e+09 8.72517e+08 5.31836e+08 2.65112e+08 9.20308e+07 1.24084e+07 3.56008e+06 4.25283e+07 1.09786e+08 1.88937e+08 2.60207e+08 3.23115e+08 3.56371e+08 3.69035e+08 3.65216e+08 3.89125e+08 4.07064e+08 3.40407e+08 2.65444e+08 2.50244e+08 2.05726e+08 1.60606e+08 1.07217e+08 ]

LOG (compare-feats[5.5.1301~3-17818]:main():compare-feats.cc:106) self-product of 2nd features for each column dimension:  [ 5.5223e+09 9.11355e+09 5.92812e+09 7.4218e+09 7.48666e+09 7.21338e+09 6.95174e+09 7.03895e+09 6.40254e+09 5.83113e+09 5.01411e+09 5.15774e+09 4.28692e+09 3.52918e+09 3.122e+09 2.28693e+09 1.76326e+09 1.3513e+09 8.72521e+08 5.31802e+08 2.65137e+08 9.20296e+07 1.2408e+07 3.5604e+06 4.25301e+07 1.09793e+08 1.88933e+08 2.60217e+08 3.23124e+08 3.56371e+08 3.69007e+08 3.65176e+08 3.89104e+08 4.07067e+08 3.40416e+08 2.65498e+08 2.50196e+08 2.057e+08 1.60612e+08 1.07192e+08 ]

LOG (compare-feats[5.5.1301~3-17818]:main():compare-feats.cc:107) cross-product for each column dimension:  [ 5.52209e+09 9.11229e+09 5.92538e+09 7.41665e+09 7.47877e+09 7.20269e+09 6.93785e+09 7.02284e+09 6.38411e+09 5.81143e+09 4.99389e+09 5.13753e+09 4.26792e+09 3.51154e+09 3.10676e+09 2.27436e+09 1.75322e+09 1.34367e+09 8.67367e+08 5.28672e+08 2.63516e+08 9.14194e+07 1.23215e+07 3.53409e+06 4.21905e+07 1.08872e+08 1.87238e+08 2.57779e+08 3.19827e+08 3.5252e+08 3.64691e+08 3.60529e+08 3.84482e+08 4.02396e+08 3.36136e+08 2.61631e+08 2.46931e+08 2.03079e+08 1.5856e+08 1.05738e+08 ]

LOG (compare-feats[5.5.1301~3-17818]:main():compare-feats.cc:111) Similarity metric for each dimension  [ 0.99997 0.999871 0.999532 0.999311 0.998968 0.998533 0.998019 0.997719 0.997111 0.996644 0.995941 0.996104 0.995572 0.995028 0.995147 0.994445 0.994258 0.994402 0.994095 0.994084 0.993934 0.993363 0.993015 0.992655 0.992037 0.991645 0.991017 0.990649 0.98981 0.989195 0.988267 0.987222 0.988093 0.98853 0.987442 0.985534 0.986858 0.987196 0.987242 0.986318 ]
 (1.0 means identical, the smaller the more different)
LOG (compare-feats[5.5.1301~3-17818]:main():compare-feats.cc:116) Overall similarity for the two feats is:0.993119 (1.0 means identical, the smaller the more different)
LOG (compare-feats[5.5.1301~3-17818]:main():compare-feats.cc:119) Processed 20960 feature files, 0 had errors.
LOG (compare-feats[5.5.1301~3-17818]:main():compare-feats.cc:126) Features are considered similar since 0.993119 >= 0.99

* [egs] Fixed file path RE augmentation, in aspire recipe (kaldi-asr#3388)

* [scripts] Update taint_ctm_edits.py, RE utf-8 encoding (kaldi-asr#3392)

* [src] Change nnet3-am-copy to allow more manipulations (kaldi-asr#3393)

* [egs] Remove confusing setting of overridden num_epochs variable in aspire (kaldi-asr#3394)

* [build] Add a missing dependency for "decoder" in Makefile (kaldi-asr#3397)

* [src] CUDA decoder performance patch (kaldi-asr#3391)

* [build,scripts] Dependency fix; add cross-references to scripts (kaldi-asr#3400)

* [egs] Fix cleanup-after-partial-download bug in aishell  (kaldi-asr#3404)

* [src] Change functions like AppiyLog() to all work out-of-place (kaldi-asr#3185)

* [src] Make stack trace display more user friendly (kaldi-asr#3406)

* [egs] Fix to separators in Aspire reverb recipe (kaldi-asr#3408)

* [egs] Fix to separators in Aspire, related to kaldi-asr#3408 (kaldi-asr#3409)

* [src] online2-tcp, add option to display start/end times (kaldi-asr#3399)

* [src] Remove debugging assert in cuda feature extraction code (kaldi-asr#3411)

* [scripts] Fix to checks in adjust_unk_graph.sh (kaldi-asr#3410)

bash test `-f` does not work for `phones/` which is a directory. Changed it to `-e`.

* [src] Added GPU feature extraction (will improve speed of GPU decoding) (kaldi-asr#3390)

Currently only supports MFCC features.

* [src] Fix build error introducted by race condition in PR requests/accepts. (kaldi-asr#3412)

* [src] Added error string to CUDA allocation errors. (kaldi-asr#3413)

* [src] Fix CUDA_VSERION number in preprocessor checks (kaldi-asr#3414)

* [src] Fix build of online feature extraction with older CUDA version (kaldi-asr#3415)

* [src] Update Insert function of hashlist and decoders (kaldi-asr#3402)

makes interface of HashList more standard; slight speed improvement.

* [src] Fix spelling mistake in kaldi-asr#3415 (kaldi-asr#3416)

* [build] Fix configure bug RE CuSolver (kaldi-asr#3417)

* [src] Enable an option to use the GPU for feature extraction in GPU decoding (kaldi-asr#3420)

This is turned on by using the option
--gpu-feature-extract=true.  By default this is on.  We provie the
option to turn it off because in situations where CPU resources
are unconfined you can get slightly higher performance with CPU
feature extraction but in most cases GPU feature extraction is faster
and has more stable performance.  In addition a user may wish to
turn it off to support models where feature extraction is currently
incomplete (e.g. FBANK, PLP, PITCH, etc).  We will add those
features in the future but for now a user wanted to decode those
models should place feature extraction on the host.

* [egs] Replace $cuda_cmd with $train_cmd for FarsDat (kaldi-asr#3426)

* [src] Remove outdated comment (kaldi-asr#3148) (kaldi-asr#3422)

* [src] Adding missing thread.join in CUDA decoder and fixing two todos (kaldi-asr#3428)

* [build] Add missing lib dependency in cudafeatbin (kaldi-asr#3427)

* [egs] Small fix to aspire run_tdnn_7b.sh (kaldi-asr#3429)

* [build] Fix to cuda makefiles, thanks: [email protected] (kaldi-asr#3431)

* [build] Add missing deps to cuda makefiles, thanks: [email protected] (kaldi-asr#3432)

* [egs] Fix encoding issues in Chinese ASR recipe (kaldi-asr#3430) (kaldi-asr#3434)

* Revert "[src] Update Insert function of hashlist and decoders (kaldi-asr#3402)" (kaldi-asr#3436)

This reverts commit 5cc7ce0.

* [src] Update Insert function of hashlist and decoders (kaldi-asr#3402) (kaldi-asr#3438)

makes interface of HashList more standard; slight speed improvement.  Fixed version of kaldi-asr#3402

* [build] Fix the cross-compiling issue for Android under MacOS (kaldi-asr#3435)

* [src] Marking operator as __host__ __device__ to avoid build issues (kaldi-asr#3441)

avoids cudafeat build failures with some CUDA toolkit versions

* [egs] Fix perl encoding bug (was causing crashes) (kaldi-asr#3442)

* [src] Cuda decoder fixes, efficiency improvements (kaldi-asr#3443)

* [scripts] Fix shebang of taint_ctm_edits.py to invoke python3 directly (kaldi-asr#3445)

* [src] Fix to a check in nnet-compute code (kaldi-asr#3447)

* [src,scripts] Various typo fixes and stylistic fixes (kaldi-asr#3153)

* [scripts] Scripts for VB (variational bayes) resegmentation for Callhome diarization (kaldi-asr#3305)

This refines the segment boundaries.  Based on code originally by Lukas Burget from Brno.

* [scripts] Extend utils/data/subsegment_data_dir.sh to copy reco2dur (kaldi-asr#3452)

* [src,scripts,egs]  Add code and example for SpecAugment in nnet3 (kaldi-asr#3449)

* [scripts] Make segment_long_utterance honor frame_shift (kaldi-asr#3455)

* [scripts] Fix to steps/nnet/train.sh (nnet1) w.r.t. incorrect bash test expressions (kaldi-asr#3456)

* [egs] fixed bug in egs/gale_arabic/s5c/local/prepare_dict_subword.sh that it may delete words matching '<*>' (kaldi-asr#3465)

* [src,build] Small fixes (kaldi-asr#3472)

* [src] fixed warning: moving a temporary object prevents copy elision

* [scripts] tools/extras/check_dependencies.sh look for alternative MKL locations via MKL_ROOT environment variable

* [src] Fixed compilation error when DEBUG is defined

* [egs] Add MGB-2 Arabic recipe (kaldi-asr#3333)

* [scripts] Check/fix utt2num_frames when fixing data dir. (kaldi-asr#3482)

* [src] A couple small bug fixes. (kaldi-asr#3477)

* [src] A couple small bug fixes.

* [src] Fix RE pdf-class 0, which is valid in pre-kaldi10

* [src,scripts] Cosmetic,file-mode fixes; fix to nnet1 align.sh introduced in kaldi-asr#3383 (kaldi-asr#3487)

* [src] Small cosmetic and file-mode fixes

* [src] Fix bug in nnet1 align.sh introduced in kaldi-asr#3383

* [egs] Add missing script in MGB2 recipe (kaldi-asr#3491)

* [egs] Fixing nnet1 but introduced in kaldi-asr#3383 (rel. to kaldi-asr#3487) (kaldi-asr#3494)

* [src] Fix for nnet3 bug encountered when implementing deltas. (kaldi-asr#3495)

* [scripts,egs] Replace LDA layer with delta and delta-delta features (kaldi-asr#3490)

WER is about the same but this is simpler to implement and more standard.

* [egs] Add updated tdnn recipe for AMI (kaldi-asr#3497)

* [egs] Create MALACH recipe based on s5b for AMI (kaldi-asr#3496)

* [scripts] add --phone-symbol-table to prepare_lang_subword.sh (kaldi-asr#3485)

* [scripts] Option to prevent crash when adapting on much smaller data (kaldi-asr#3506)

* [build,scripts] Make OpenBLAS install check for gfortran; documentation fix (kaldi-asr#3507)

* [egs] Update chain TDNN-F recipe for CHIME s5 to match s5b, improves results (kaldi-asr#3505)

* [egs] Fix to kaldi-asr#3505: updating chime5 TDNN-F script (kaldi-asr#3508)

* [scripts] Fixed issue that leads to empty segment file (kaldi-asr#3510)

Fixed issue that leads to empty segment file on SSD disk (more detail: https://groups.google.com/forum/#!topic/kaldi-help/Ij3lQLCinN8)

* [egs] Fix bug in AMI s5b RE overlapping segments that causes Fixed overlap segment bug (kaldi-asr#3503)

* [egs] Small cosmetic change in extend_vocab_demo.sh (kaldi-asr#3516)

* [src] Cosmetic changes; fix windows-compile bug reported by @spencerkirn (kaldi-asr#3515)

* [src] Move cuda gpu from nnetbin to nnet3bin.  (kaldi-asr#3513)

* fix a bug in egs/voxceleb/v1/local/make_voxceleb1_v2.pl when preparing the file data/voxceleb1_test/trials (kaldi-asr#3512)

* [egs] Fixed some bugs in mgb_data_prep.sh of mgb2_arabic (kaldi-asr#3501)

* [src,scripts] fix various typos and errors in comments (kaldi-asr#3454)

* [src] Move cuda-compiled to nnet3bin (kaldi-asr#3517)

* [src] Fix binary name in Makefile, RE cuda-compiled (kaldi-asr#3518)

* [src] buffer fix in cudafeat (kaldi-asr#3521)

* [src] Hopefully make it possible to use empty FST in grammar-fst (kaldi-asr#3523)

* [src] Add option to convert pdf-posteriors to phones (kaldi-asr#3526)

* [src] Fix GetDeltaWeights for long-running online decoding (kaldi-asr#3528)

* Fix GetDeltaWeights for long-running online decoding. Use frame count relative to decoder start internally in silence weighting and update to the frame count relative to pipeline only once result is calculated.

* Add a note about transition to a new function API

* [src] Small fix to post-to-phone-post.cc (problem introduced in from kaldi-asr#3526) (kaldi-asr#3534)

* [src]: adding Dan's fix to a bug in nnet-computation-graph (kaldi-asr#3531)

* [egs] Replace prep_test_aspire_segmentation.sh (kaldi-asr#2943) (kaldi-asr#3530)

* [egs] OCR: Decomposition for CASIA and YOMDLE_ZH datasets (kaldi-asr#3527)

* [build] check_dependencies.sh: correct installation command for fedora (kaldi-asr#3539)

* [src,doc] Fix bug in new option of post-to-phone-post; skeleton of faq page (kaldi-asr#3540)

* [egs,scripts]  Adding possibility to have 'online-cmn' on input of 'nnet3' models (kaldi-asr#3498)

* [scripts] Fix to build_tree_multiple_sources.sh (kaldi-asr#3545)

* [doc] Fix accidental overwriting of kws page (kaldi-asr#3541)

* [egs] Fix regex filter in Fisher preprocessing (was excluding 2-letter sentences like "um") (kaldi-asr#3548)

* [scripts] Fix to bug introduced in kaldi-asr#3498 RE ivector-extractor options mismatch. (kaldi-asr#3549)

* [scripts] Fix awk compatibility issue; be more careful about online_cmvn file (kaldi-asr#3550)

* [src] Add a method for backward-compatibility with previous API (kaldi-asr#3536)

* [src] Feature bank feature extraction using CUDA (kaldi-asr#3544)

Following this change, both MFCC and fbank run
through a single code path with parameters
(use_power, use_log_fbank and use_dct) controlling
the flow.

CudaMfcc has been renamed to CudaSpectralFeatures.

It contains an MfccOptions structure which contains
FrameOptions and MelOptions. It can be initialized
either with an MfccOptions object or an FbankOptions
object.

Compared with CudaMfccOptions, CudaSpectralOptions also
contains these parameters

use_dct  - switches on the discrete cosine and lifter
use_log_fbank - takes the log of the MEL banks values
use_power - uses power in place of abs(amplitude)

Each of these is defaulted on for MFCC. For fbank,
use_dct is set to false. The others are set by user
parameters.

Also added a unit test for CUDA Fbank
(cudafeatbin/compute-fbank-feats-cuda).

* [src] Fix missing semicolon (kaldi-asr#3551)

* [src] fix a typo mistaking not equal for assign sign in CUDA feature pipeline (kaldi-asr#3552)

* [src] Fix issue kaldi-asr#3401 (crash in ivector extraction with max-remembered-frames+silence-weight) (kaldi-asr#3405)

* [egs] Librispeech: in RESULTS, change best_wer.sh => utils/best_wer.sh (kaldi-asr#3553)

* [egs] semisupervised recipes: fixing some variables in comments  (kaldi-asr#3547)

* [scripts]  fix utils/lang/extend_lang.sh to add nonterm symbols in align_lexicon.txt (kaldi-asr#3556)

* [scripts] Fix to bug in steps/data/data_dir_manipulation_lib.py (kaldi-asr#3174)

* [src] Fix in nnet3-attention-component.cc, RE required context (kaldi-asr#3563)

* [src] Temporarily turn off some model-collapsing code while investigating a bug (kaldi-asr#3565)

* [scripts] Fix to data cleanup script RE utf-8 support in uttearnce

* [src,scripts,egs] online-cmvn for online2 with chain models, (kaldi-asr#3560)

* Add OnlineCMVN to Online NNET2 pipeline.  This is used in some models (e.g.
CVTE).  This is optional and off by default.  It applies CMVN before
applying pitch.  This code is essentially copied out of
"online2/online-feature-pipeline.cc/h".

Patch set provided by Levi Barnes.

* online2: bugfix of config script, include ivector_period into the iextractor config file.

* online-cmvn in online-nnet2-feature-pipeline,

- update the C++ code from @luitjens,
   - introduced `OnlineFeatureInterface *nnet3_feature_` to
     explictly mark features that are passed to nnet3 model
   - added the transfer of OnlineCmvnState across utterances from same speaker
- update the 'prepare_online_decoding.sh' to support online-cmvn
- enabled OnlineCmvnStats transfer in training/decoding

* OnlineNnet2FeaturePipeline, removing unused constructor, updating results

* [build,src] Change to error message; update kaldi_lm install script (and kaldi_lm) to fix compile issue (kaldi-asr#3581)

* [src] Clarify something in plda.h (cosmetic change) (kaldi-asr#3588)

* [src] Small fix to online ivector extraction (RE kaldi-asr#3401/kaldi-asr#3405), thanks: Vladmir Vassiliev. (kaldi-asr#3592)

Stops an assert failure by changing the assert condition

* [egs] Add recipe for Chinese training with multiple databases (kaldi-asr#3555)

* [src] Remove duplicate `num_done++` in apply-cmvn-online.cc (kaldi-asr#3597)

* [src] Fix to CMVN with CUDA (kaldi-asr#3593)

* * __restrict__ not __restrict__ *

Fixing a complaint during Windows compilation

* Plumbing for CUDA CMVN

* Removing detritus.

Removing comments, an unused variable and
NULL'ing some object pointers (just in case)

* [src] Fix two bugs in batched-wav-nnet3-cuda binary. (kaldi-asr#3600)

1)  Using the "Any" apis prior to finishing submission of the full group
could lead to a group finishing early.  This would cause output to
appear repeatedly.  I have not seen this occur but an audit revealed it
as an issue. The fix is to use the "Any" APIs only when full groups have
been submitted.

2) GetNumberOfTasksPending() can return zero even though groups have not
been waited for.  This API call should not be used to determine if all
groups have been completed as the number of pending tasks is independent
of the number of groups remaining.

* [scripts] propagated bad whitespace fix to validate_dict_dir.pl (cosmetic change) (kaldi-asr#3601)

* [scripts] bug-fix on subset_data_dir.sh with --per-spk option (kaldi-asr#3567) (kaldi-asr#3572)

The script failed to return the number of utterances `per-spk` requested, despite being available in the original spk2utt file.
The stop-boundary on the awk's for-loop was off by 1.

> An example of how affects the spk2utt files

```$ cat tmpData/spk2utt 
spk1 spk1-utt1 spk1-utt2 spk1-utt3
spk2 spk2-utt1
spk3 spk3-utt1 spk3-utt2
spk4 spk4-utt1 spk4-utt2 spk4-utt3 spk4-utt4
$
$ ./subset_data_dir.sh --per-spk tmpData 2 tmpData-before
$ cat tmpData-before/spk2utt 
spk1 spk1-utt1 
spk2 spk2-utt1 
spk3 spk3-utt1 
spk4 spk4-utt1 spk4-utt3 
$
$ ./subset_data_dir_MOD.sh --per-spk tmpData 2 tmpData-after
$ cat tmpData-after/spk2utt 
spk1 spk1-utt1 spk1-utt2 
spk2 spk2-utt1 
spk3 spk3-utt1 spk3-utt2 
spk4 spk4-utt1 spk4-utt3 
```

* [src] Code changes to support GCC9 + OpenFST1.7.3 + C++2a (namespace issues) (kaldi-asr#3570)

* [scripts] Training ivector-extractor: make online-cmvn per speaker. (kaldi-asr#3615)

* [src] cached compiler I/O for nnet3-xvector-compute (kaldi-asr#3197)

* [src] Fixed online2-tcp-nnet3-decode-faster.cc poll_ret checks (kaldi-asr#3611)

* [scripts] Call utils/data/get_utt2dur.sh using the correct $cmd and $nj (kaldi-asr#3608)

* [scripts] Enable tighter control of downloaded dependencies (kaldi-asr#3543) (kaldi-asr#3573)

* [scripts] Make reverberate_data_dir.py handle vad.scp (kaldi-asr#3619)

* [scripts] Don't get utt2dur in librispeech.. will be made by make_mfcc.sh now (kaldi-asr#3610) (kaldi-asr#3620)

* [scripts] Make combine_ali_dirs.sh work when queue.pl is used (kaldi-asr#3537) (kaldi-asr#3561)

* [egs] Fix duplicate removal of unk from Librispeech decoding graphs (kaldi-asr#3476) (kaldi-asr#3621)

* [build] Check for gfortran, needed by OpenBLAS (for lapack) (kaldi-asr#3622)

* [scripts] VB resegmentation: load MFCCs only when used (save memory) (kaldi-asr#3612)

* [build] removed old Docker files - see docker in the root folder for the latest files (kaldi-asr#3558)

* [src] Fix to compute-mfcc-feats.cc (thanks: @puneetbawa) (kaldi-asr#3623)

* [egs] Update AMI tdnn recipe to reflect what was really run, and add hires feats. (kaldi-asr#3578)

* [egs] Fix to run_ivector_common.sh in swbd (crash when running from stage 3) (kaldi-asr#3631)

* [scripts] Make data augmentation work with UTF-8 utterance ids/filenames (kaldi-asr#3633)

* [src] fix a bug in src/online/online-faster-decoder.cc (prevent segfault with some models) (kaldi-asr#3634)

* [scripts]  Make extend_lang.sh support lexiconp_silprob.txt (kaldi-asr#3339) (kaldi-asr#3632)

* [scripts] Fix typo in analyze_alignments.sh (kaldi-asr#3635)

* [scripts] Change the GPU policy of align.sh to wait instead of yes (kaldi-asr#3636)

This is needed to avoid failures when using more number jobs than number of GPUs. Previously when --use-gpu=yes, when all GPUs are occupied, the job will error out in 20s. Changing to the policy to wait will have correct behavior.

* [egs] Added tri3b and chain training for Aurora4 (kaldi-asr#3638)

* [build] fixed broken Docker builds by adding gfortran package (kaldi-asr#3640)

* [src] Fix bug in resampling checks for online features (kaldi-asr#3639)

The --allow-upsample and --allow-downsample options were not handled correctly in the code that handles resampling for computing online features.

* [build] Bump OpenBLAS version to 0.3.7 and enable locking (kaldi-asr#3642)

Bumps OpenBLAS version to 0.3.7 for improved performance with Zen architecture. Also sets USE_LOCKING which fixes issue when calling a single-threaded OpenBLAS from several threads in parallel

* [scripts] Update nnet3_to_dot.py to ignore Scale descriptors (kaldi-asr#3644)

* [src] Speed optimization for decoders (call templates properly) (kaldi-asr#3637)

* [doc] update FAQ page to include some FAQ candidates from Kaldi mailing lists (kaldi-asr#3646)

* [egs] Small fix: duplicate xent settings from examples (kaldi-asr#3649)

* [doc] Fix typo (kaldi-asr#3648)

* [egs] Fix a bug in Tedlium run_ivector_common.sh (kaldi-asr#3647)

Running perturb_data_dir_speed_3way.sh after making MFCC would cause an error, saying "feats.scp already exists". It is also uncessary to run it twice.

* [src] Fix to matrix-lib-test to ignore small difference (kaldi-asr#3650)

* [doc] update FAQ page: (kaldi-asr#3651)

1. Added some new FAQ candiates from mailing lists.
2. Added section for logo, version and books recommendation for beginners.

* [scripts] Modify split_data.sh to split data evenly when utt2dur exists (kaldi-asr#3653)

* [doc] update FAQ page: added section for free dataset, python wrapper, etc. (kaldi-asr#3652)

* [src] Some CUDA i-vector fixes (kaldi-asr#3660)

no longer assume we are starting at frame 0.  This is needed for eventual online decoding.

  src/cudafeat/online-ivector-feature-cuda.cc:
    Added code to use LU-decomposition for debugging.  Cholesky's is still used and works fine but this is a good test if something wrong is suspected in the solver.
    Added support for older toolkits but with a less efficient algorithm.

* [src] CUDA batched decoder pipeline fixes (kaldi-asr#3659)

bug fix: don't assume stride and number of colums is the same when packing matrix into a vector.

  src/cudadecoder/batched-threaded-nnet3-cuda-pipeline.cc:
    added additional sanity checking to better report errors in the field.
    no longer pinning memory for copying waves down.  This was causing consistency issues.  It is unclear why the code is not working and will continue to evaluate.
    This optimization doesn't add a lot of perf so we are disabling it for now.
    general cleanup
    fixed bug with tasks array being possibly being resized before being read.

  src/cudadecoderbin/batched-wav-nnet3-cuda.cc:
    Now outputting every iteration as a different lattice.  This way we can score every lattice and better ensure correctness of binary.
    clang-format (removing tabs)

* [egs] Small fix to aspire example: pass in num_data_reps (kaldi-asr#3665)

* [doc] Update FAQ page (kaldi-asr#3663)

1. Adde section of Android for Kaldi
2. list some useful scripts in data processing
3. Illustrate some common tools

* [src] Add batched xvector computation (kaldi-asr#3643)

* [src] CUDA decoder: write all output to single stream (kaldi-asr#3666)

.. instead of one per iteration.

We are seeing that small corpus are dominated by writer Open/Close.

A better solution is to modify the key with the iteration number and
then handle the modified keys in scoring.  Note that if you have just a
single iteration the key is not modified and thus behavior is as
expected.

* [egs] Fix sox command in multi-Updated sox command (kaldi-asr#3667)

* [src] Change data-type for hash in lattice alignment (avoid debugger complaints)  (kaldi-asr#3672)

* [egs] Two fixes to multi_en setup  (kaldi-asr#3676)

* [scripts] Fix misspelled variable in reverb script (kaldi-asr#3678)

* Update cudamatrix.dox (kaldi-asr#3674)

* [scripts] Add reduced-context option for TDNN-F layers. (kaldi-asr#3658)

* [egs] Make librispeech run_tdnn_discriminative.sh more efficient with disk (kaldi-asr#3685)

* [egs] Remove pitch from multi_cn nnet3 recipe (kaldi-asr#3686)

* [egs] multi_cn: clarify metric is CER (kaldi-asr#3687)

* [src,minor] Fix typo in comment in kaldi-holder.h (kaldi-asr#3691)

* [egs] update run_tdnn_discriminative.sh in librispeech recipe (kaldi-asr#3689)

* [egs] Aspire recipe: fixed typo in utterance and speaker prefix (kaldi-asr#3696)

* [src] cuda batched decoder, fix memory bugs (kaldi-asr#3697)

bug fix:  Pass token by reference so we can return an address of it.
Previously it was done by value which means we returned an address of
a locally scoped variable. This lead to memory corruptions.

re-add pinned memory (i.e. disable workaround)

Free memory on control thread, switch shared_ptr to unique_ptr, don't
reset unique ptr.

* [scripts] Change make_rttm.py to read/write files with UTF-8 encoding (kaldi-asr#3705)

* [src] Removing non-compiling paranoid asserts in nnet-computation-graph. (kaldi-asr#3709)

* [build] Fix gfortran package name for centos (kaldi-asr#3708)

* [scripts] Change the Python diagnostic scripts to accept non-ASCII UTF-8 phone set (kaldi-asr#3711)

* [scripts] Fix some issues in kaldi-asr#3653 in split_scp.pl (kaldi-asr#3710)

* [scripts] Fix 2 issues in nnet2->nnet3 model conversion script (kaldi-asr#886)(kaldi-asr#3713)

Parsing learning rates would fail if it was written in e-notation (e.g.
1e-5). Also fixes ivector-dim=0 which could result if const component
dim=0 in src nnet2 model.

* Removing changes to split_scp.pl (kaldi-asr#3717)

* [scripts] Improve how combine_ali_dirs.sh gets job-specific filenames (kaldi-asr#3720)

* [src] Add --debug-level=N configure option to control assertions (kaldi-asr#3690) (kaldi-asr#3700)

* [src] Adding some more feature extraction options (needed by some users..) (kaldi-asr#3724)

* [src,script,egs] Goodness of Pronunciation (GOP) (kaldi-asr#3703)

* [src] Making ivector extractor tolerate dim mismatch due to pitch (kaldi-asr#3727)

* Revert "[src] Making ivector extractor tolerate dim mismatch due to pitch (kaldi-asr#3727)" (kaldi-asr#3728)

This reverts commit 59255ae.

* [src] Fix NVCC compilation errors on Windows (kaldi-asr#3741)

* make nvcc+msvc happier when two-phase name lookup involved,
NOTE: in simple native case (CPU, without cuda), msvc is happy with TemplatedBase<T>::base_value, but nvcc is not...
* position of __restrict__ in msvc is restricted

* [build] Add CMake Build System as alternative to current Makefile-based build (kaldi-asr#3580)

* [scripts] Modify split_data_dir.sh and split_scp.pl to use utt2dur if present, to balance splits

* [scripts] fix slurm.pl error (kaldi-asr#3745)

* Revert "[scripts] Modify split_data_dir.sh and split_scp.pl to use utt2dur if present, to balance splits" (kaldi-asr#3746)

This reverts commit 1d0b267.

* [egs] Children's speech ASR recipe for cmu_kids and cslu_kids (kaldi-asr#3699)

* [src] Incremental determinization [cleaned up/rewrite] (kaldi-asr#3737)

* [scripts] Add scripts to create combine fmllr-tranform dirs(kaldi-asr#3752)

* [src] CUDA decoder: fix invalid-lattice error that happens in corner cases (kaldi-asr#3756)

* [egs] Add Chime 6 baseline system (kaldi-asr#3755)

* [scripts] Fix issue in copy_lat_dir.sh affecting combine_lat_dirs.sh (missing phones.txt) (kaldi-asr#3757)

* [src] Add missing #include, needed for CUDA decoder compilation on some platforms (kaldi-asr#3759)

* [scripts] fix bug in steps/data/reverberate_data_dir.py (kaldi-asr#3762)

* [src] CUDA allocator: fix order of next largest block (kaldi-asr#3739)

* [egs] some fixes in Chime6 baseline system (kaldi-asr#3763)

* changed scoring tool for diarization

* added comment for scoring

* fixing number of deletions, adding script to check DP result of the total errors is equivalent to the sum of the individual errors

* updated RESULTS for new diarization scoring

* outputing wer similar to compute-wer routine

* adding routine to select best LM weight and insertion penalty factor based on the development set

* updating results

* changing lang_chain to lang, minor fix

* adding all array option

* change in directory structure of scoring_kaldi_multispeaker to make it similar to scoring_kaldi

* removing test sets from run ivector script

* added ref RTTM creation

* making modifications for all array

* minor fix

* [src] CUDA decoding: add support for affine transforms to CUDA feature extractor (kaldi-asr#3764)

* [src] relax assertion constraint slightly (RE matrix orthonormalization) (kaldi-asr#3767)

* [src] CUDA decoder: fix bug in NumPendingTasks()  (kaldi-asr#3769)

* [src] Add options to select specific gpu, reuse cuda context (kaldi-asr#3750)

* [src] Move CheckAndFix to config struct (kaldi-asr#3749)

* [egs,scripts] Add recipes for CN-Celeb (kaldi-asr#3758)

* [src] CUDA decoder: remove unecessary sync that was added for debugging (kaldi-asr#3770)

* [src] CUDA decoder: shrink channel vectors instead of vector holding those vectors (kaldi-asr#3768)

* add include path

* update test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants