Made changes to the augmentation script to make it work for ASR and speaker ID #3119

phanisankar-nidadavolu · 2019-03-15T15:21:05Z

No description provided.

…peaker ID

danpovey · 2019-03-15T16:37:43Z

@david-ryan-snyder please LMK if this is good to merge

danpovey

Some smallish comments. Big question is what to do with ivectors and CMN and whether to make those changes at the same time as this.

danpovey · 2019-03-17T21:13:09Z

egs/swbd/s5c/local/make_musan.sh

+
+mkdir -p local/musan.tmp
+
+echo "Preparing ${data_dir}/musan..."


I think I'd like this script to exit with an error if it detects that what it is trying to create already exists.

danpovey · 2019-03-17T21:15:48Z

egs/swbd/s5c/local/make_musan.py

+# Copyright 2015   David Snyder
+# Apache 2.0.
+#
+# This file is meant to be invoked by make_musan.sh.


Since we are re-using this script I think it would be better to work on it a bit more carefully-- e.g. use argparse to parse the command line args (with something more standard than the 'Y' to use the vocals, i.e. make it a boolean flag)-- and have a proper usage message printed out if called wrongly.
Also, I think we can put this in a shared place, steps/augmentation/ or wherever, since other scripts will likely want to call it. The same goes for the shell script-- let's put it somewhere shared, e.g. in the same place; have it check whether its input already exists; give it a usage message; and have it parse command line args properly (e.g. give it a --use-vocals option). Also some kind of function-level documentation for this python script would be nice (e.g. a doc-string).

danpovey · 2019-03-17T21:18:03Z

egs/swbd/s5c/local/nnet3/prepare_multistyle_data.sh

@@ -0,0 +1,242 @@
+#!/bin/bash
+


put the normal apache header on this please, and your name.

Also can you please add an arg use_ivectors=true that wil give the option to skip all the i-vector-related parts of the script? We may want to use this option later on when we switch to online CMN, but for now I guess we can leave that issue separate, I suppose there is no need to mix it up with this.

danpovey · 2019-03-17T21:20:02Z

egs/swbd/s5c/local/nnet3/prepare_multistyle_data.sh

+fi
+
+if [ "$multi_style" == "true" ]; then
+  if [ $stage -le 1 ]; then


I think you can break this up into more stages, in case it dies somewhere in the middle. And please have more of the stages check whether their work has already been done and die with a suitable message if so.

danpovey · 2019-03-17T21:22:47Z

egs/swbd/s5c/local/nnet3/prepare_multistyle_data.sh

+
+if [ -e data/rt03 ]; then maybe_rt03=rt03; else maybe_rt03= ; fi
+
+if [ "$multi_style" == "true" ]; then


I think we should remove the multi-style option here, for purposes of experiments, since if people didn't want it, they just shouldn't call this. I know this was useful for you... you can of course use it for your own experiments, but i would have though just settinng noise_list to "clean" would do what you want.
Also I don't like the name "noise_list" as reverb and clean are not noise. Let's call it "augmentation_list" wherever it occurs.

danpovey · 2019-03-17T21:24:42Z

egs/wsj/s5/steps/data/augment_data_dir.py

+    else:
+        raise Exception("Trying to add both prefix and suffix. Choose either of them")
+
+    args.modify_spkr_id = (True if args.modify_spkr_id == "true" else False)


An example of how to register boolean options:

parser.add_argument("--chain.apply-deriv-weights", type=str, dest='apply_deriv_weights', default=True, action=common_lib.StrToBoolAction, choices=["true", "false"], help="")

danpovey · 2019-03-17T21:25:50Z

egs/wsj/s5/steps/data/augment_data_dir.py

    parser.add_argument('--random-seed', type=int, dest = "random_seed", default = 123, help='Random seed.')

+    parser.add_argument('--modify-spkr-id', type=str, default = "false", choices=["true", "false"], help='Utt prefix or suffix would be added to the speaker id also (Used in ASR), in speaker id it is left unmodifed' )


let's call this --modify-spk-id .. more consistent with spk2utt, etc.

danpovey · 2019-03-17T21:27:02Z

egs/wsj/s5/steps/data/augment_data_dir.py

+    CopyFileIfExists(input_dir + "/reco2file_and_channel", output_dir + "/reco2file_and_channel", args.utt_modifier_type, args.utt_modifier, fields=[0, 1])
+
+    if args.modify_spkr_id:
+        CopyFileIfExists(input_dir + "/spk2gender", output_dir + "/spk2gender", args.utt_modifier_type, args.utt_modifier)


please try to mostly keep within 80 or 100 characters by breaking long lines.

danpovey · 2019-03-17T21:28:50Z

egs/wsj/s5/steps/data/augment_data_dir.py

-def CopyFileIfExists(utt_suffix, filename, input_dir, output_dir):
-    if os.path.isfile(input_dir + "/" + filename):
-        dict = ParseFileToDict(input_dir + "/" + filename,
+# This function generates a new id from the input id


It's good that you are adding documentation, but let's do it with a doc-string, e.g.

def Foo(string): """ Foo is a function that does nothing. 'string' is expected to be of type str, but is otherwise ignored. This function returns None. """ return None

jty016 · 2019-03-25T00:50:10Z

egs/swbd/s5c/local/make_musan.py

+    if utt in utt2wav:
+      if use_vocals or not utt2vocals[utt]:
+        utt2spk_str = utt2spk_str + utt + " " + utt2spk[utt] + "\n"
+        utt2wav_str = utt2wav_str + utt + " sox -t wav " + utt2wav[utt] + " -r 8k -t wav - |\n"


I think this script can be used for 16000 Hz sampled audio too, how about 8k as a parameter

danpovey · 2019-04-18T21:48:00Z

@david-ryan-snyder or @phanisankar-nidadavolu, perhaps you can comment? And @phanisankar-nidadavolu, let me know if I should be merging this now.

phanisankar-nidadavolu · 2019-04-18T22:09:32Z

Hello Dan, I think it is a good observation. musan corpus is originally at 16KHz. When we use this script (musan.py) for augmenting 8k data (like swbd, sre16) we downsample musan corpus to 8kHZ (like in the below line) utt2wav_str = utt2wav_str + utt + " sox -t wav " + utt2wav[utt] + " -r 8k -t wav - |\n" if the corpus is at 16k (like SITW) this is not required. Ideal setup would be to make sampling frequency optional and use same musan.py everywhere. But since musan scripts are located in local directories, it is safe enough to assume the sampling frequency is known in advance (you know whether to resample it or not). Let me know what you think. Regarding merging I will push my latest changes tonight and we can merge. Phani

…

On Thu, Apr 18, 2019 at 5:48 PM Daniel Povey ***@***.***> wrote: @david-ryan-snyder <https://github.com./david-ryan-snyder> or @phanisankar-nidadavolu <https://github.com./phanisankar-nidadavolu>, perhaps you can comment? And @phanisankar-nidadavolu <https://github.com./phanisankar-nidadavolu>, let me know if I should be merging this now. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3119 (comment)>, or mute the thread <https://github.com./notifications/unsubscribe-auth/AH3QA2UWX2EEPUDWYR56Q23PRDUC7ANCNFSM4G622LCA> .

danpovey · 2019-04-18T22:10:43Z

It sounds like a great idea. Please, if you have time, either implement it yourself or work with the person who raised the issue to do so if you feel that is practical. On Thu, Apr 18, 2019 at 3:09 PM phanisankar-nidadavolu < [email protected]> wrote:

…

Hello Dan, I think it is a good observation. musan corpus is originally at 16KHz. When we use this script (musan.py) for augmenting 8k data (like swbd, sre16) we downsample musan corpus to 8kHZ (like in the below line) utt2wav_str = utt2wav_str + utt + " sox -t wav " + utt2wav[utt] + " -r 8k -t wav - |\n" if the corpus is at 16k (like SITW) this is not required. Ideal setup would be to make sampling frequency optional and use same musan.py everywhere. But since musan scripts are located in local directories, it is safe enough to assume the sampling frequency is known in advance (you know whether to resample it or not). Let me know what you think. Regarding merging I will push my latest changes tonight and we can merge. Phani On Thu, Apr 18, 2019 at 5:48 PM Daniel Povey ***@***.***> wrote: > @david-ryan-snyder <https://github.com./david-ryan-snyder> or > @phanisankar-nidadavolu <https://github.com./phanisankar-nidadavolu>, > perhaps you can comment? And @phanisankar-nidadavolu > <https://github.com./phanisankar-nidadavolu>, let me know if I should be > merging this now. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#3119 (comment)>, or mute > the thread > < https://github.com./notifications/unsubscribe-auth/AH3QA2UWX2EEPUDWYR56Q23PRDUC7ANCNFSM4G622LCA > > . > — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#3119 (comment)>, or mute the thread <https://github.com./notifications/unsubscribe-auth/AAZFLO3IVZN3IZRL466MZ5DPRDWSJANCNFSM4G622LCA> .

phanisankar-nidadavolu · 2019-04-18T22:14:00Z

I will do it myself but where should I place it, steps/data or would it still be in local directories? Phani On Thu, Apr 18, 2019 at 6:11 PM Daniel Povey <[email protected]> wrote:

…

It sounds like a great idea. Please, if you have time, either implement it yourself or work with the person who raised the issue to do so if you feel that is practical. On Thu, Apr 18, 2019 at 3:09 PM phanisankar-nidadavolu < ***@***.***> wrote: > Hello Dan, > > I think it is a good observation. musan corpus is originally at 16KHz. When > we use this script (musan.py) for augmenting 8k data (like swbd, sre16) we > downsample musan corpus to 8kHZ (like in the below line) > > utt2wav_str = utt2wav_str + utt + " sox -t wav " + utt2wav[utt] + " -r 8k > -t wav - |\n" > > if the corpus is at 16k (like SITW) this is not required. Ideal setup would > be to make sampling frequency optional and use same musan.py everywhere. > But since musan scripts are located in local directories, it is safe enough > to assume the sampling frequency is known in advance (you know whether to > resample it or not). Let me know what you think. > > Regarding merging I will push my latest changes tonight and we can merge. > > > Phani > > > On Thu, Apr 18, 2019 at 5:48 PM Daniel Povey ***@***.***> > wrote: > > > @david-ryan-snyder <https://github.com./david-ryan-snyder> or > > @phanisankar-nidadavolu <https://github.com./phanisankar-nidadavolu>, > > perhaps you can comment? And @phanisankar-nidadavolu > > <https://github.com./phanisankar-nidadavolu>, let me know if I should be > > merging this now. > > > > — > > You are receiving this because you were mentioned. > > Reply to this email directly, view it on GitHub > > <#3119 (comment)>, > or mute > > the thread > > < > https://github.com./notifications/unsubscribe-auth/AH3QA2UWX2EEPUDWYR56Q23PRDUC7ANCNFSM4G622LCA > > > > . > > > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#3119 (comment)>, or mute > the thread > < https://github.com./notifications/unsubscribe-auth/AAZFLO3IVZN3IZRL466MZ5DPRDWSJANCNFSM4G622LCA > > . > — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3119 (comment)>, or mute the thread <https://github.com./notifications/unsubscribe-auth/AH3QA2TGX4BJ3FWHXSPYFF3PRDWYFANCNFSM4G622LCA> .

david-ryan-snyder · 2019-04-18T22:14:22Z

The make_musan.py script is just a data prep scrip in local. Since SWBD uses audio sampled at 8kHz we know what the sample rate should be for this script. I don't see why this needs to be an option.

danpovey · 2019-04-18T22:15:44Z

Oh. Since we will likely be relying on this in the data augmentation scripts once we widely use them, I think it would make sense to put it somewhere shared though, e.g. steps/augmentation/ ?

…

On Thu, Apr 18, 2019 at 3:14 PM David Snyder ***@***.***> wrote: The make_musan.py script is just a data prep scrip in local. Since SWBD uses audio sampled at 8kHz we know what the sample rate should be for this script. I don't see why this needs to be an option. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#3119 (comment)>, or mute the thread <https://github.com./notifications/unsubscribe-auth/AAZFLO2ZADTLKGLSKQLAXH3PRDXENANCNFSM4G622LCA> .

david-ryan-snyder · 2019-04-18T22:16:44Z

If we consider MUSAN important enough to put into steps/ it makes sense to change the usage so that the sample rate is an option. If you go that route, you should be sure to change all of the recipes that use MUSAN (that will include all of the x-vector recipes) so that it uses the new script in steps/ and also be sure to remove the old script in local.

danpovey · 2019-04-18T22:17:59Z

Agreed.

…

On Thu, Apr 18, 2019 at 3:16 PM David Snyder ***@***.***> wrote: If we consider MUSAN important enough to put into steps/ it makes sense to change the usage so that the sample rate is an option. If you go that route, you should be sure to change all of the recipes that use MUSAN (that will include all of the x-vector recipes) so that it uses the new script in steps/ and also be sure to remove the old script in local. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#3119 (comment)>, or mute the thread <https://github.com./notifications/unsubscribe-auth/AAZFLO44ONBPXHNALKSKMT3PRDXNHANCNFSM4G622LCA> .

david-ryan-snyder · 2019-04-18T22:18:48Z

Moving it to steps/augmentation or steps/data sounds like a good idea to me.

danpovey · 2019-04-18T22:21:19Z

I prefer steps/augmentation/, as it's not a *generic* script for manipulating data dirs.

…

On Thu, Apr 18, 2019 at 3:18 PM David Snyder ***@***.***> wrote: Moving it to steps/augmentation or steps/data sounds like a good idea to me. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#3119 (comment)>, or mute the thread <https://github.com./notifications/unsubscribe-auth/AAZFLO23CARQYVRDCD25E6DPRDXU7ANCNFSM4G622LCA> .

danpovey · 2019-04-18T22:22:08Z

Oh, I see, steps/data/ already has augmentation stuff. OK, put it there.

…

On Thu, Apr 18, 2019 at 3:21 PM Daniel Povey ***@***.***> wrote: I prefer steps/augmentation/, as it's not a *generic* script for manipulating data dirs. On Thu, Apr 18, 2019 at 3:18 PM David Snyder ***@***.***> wrote: > Moving it to steps/augmentation or steps/data sounds like a good idea to > me. > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#3119 (comment)>, > or mute the thread > <https://github.com./notifications/unsubscribe-auth/AAZFLO23CARQYVRDCD25E6DPRDXU7ANCNFSM4G622LCA> > . >

phanisankar-nidadavolu · 2019-04-18T22:26:11Z

I will do that. Phani On Thu, Apr 18, 2019 at 6:22 PM Daniel Povey <[email protected]> wrote:

…

Oh, I see, steps/data/ already has augmentation stuff. OK, put it there. On Thu, Apr 18, 2019 at 3:21 PM Daniel Povey ***@***.***> wrote: > I prefer steps/augmentation/, as it's not a *generic* script for > manipulating data dirs. > > On Thu, Apr 18, 2019 at 3:18 PM David Snyder ***@***.***> > wrote: > >> Moving it to steps/augmentation or steps/data sounds like a good idea to >> me. >> >> — >> You are receiving this because you commented. >> Reply to this email directly, view it on GitHub >> <#3119 (comment)>, >> or mute the thread >> < https://github.com./notifications/unsubscribe-auth/AAZFLO23CARQYVRDCD25E6DPRDXU7ANCNFSM4G622LCA > >> . >> > — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3119 (comment)>, or mute the thread <https://github.com./notifications/unsubscribe-auth/AH3QA2R7QRWQCGF2YGAPJ73PRDYC3ANCNFSM4G622LCA> .

…r on augmented data, moved musan scripts to steps/data and other script level changes that Dan suggested in the PR review earlier

david-ryan-snyder · 2019-04-22T20:45:46Z

@phanisankar-nidadavolu, is this PR done and ready to be reviewed?

phanisankar-nidadavolu · 2019-04-22T20:48:23Z

Yes it is. I ran a test run and the script has successfully reached TDNN training stage. Phani

…

On Mon, Apr 22, 2019 at 4:46 PM David Snyder ***@***.***> wrote: @phanisankar-nidadavolu <https://github.com./phanisankar-nidadavolu>, is this PR done and ready to be reviewed? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3119 (comment)>, or mute the thread <https://github.com./notifications/unsubscribe-auth/AH3QA2XJJGJ6KYMNQGDPCT3PRYPZVANCNFSM4G622LCA> .

david-ryan-snyder · 2019-04-22T23:06:57Z

egs/swbd/s5c/local/nnet3/prepare_multistyle_data.sh

+    data/${train_set} data/${train_set}_babble
+
+  # Combine all the augmentation dirs
+  # This part can be simplified once we know what noise types we will add


Is this comment still relevant?

I think it is still relevant for now. We have not arrived at a general conclusion yet.

david-ryan-snyder · 2019-04-22T23:11:17Z

egs/wsj/s5/steps/data/make_musan.sh

+
+if [ $# -ne 2 ]; then
+    echo USAGE: $0 input_dir output_dir
+    echo input_dir is the path where the original musal corpus is located


"original musal" -> "MUSAN"

david-ryan-snyder · 2019-04-22T23:24:53Z

egs/wsj/s5/steps/data/make_musan.sh

+# This script creates the MUSAN data directory.
+# Consists of babble, music and noise files.
+# Used to create augmented data
+# The required dataset is freely available at http://www.openslr.org/17/


Could you add a comment that says something like:

# The corpus can be cited as follows: # @misc{musan2015, # author = {David Snyder and Guoguo Chen and Daniel Povey}, # title = {{MUSAN}: {A} {M}usic, {S}peech, and {N}oise {C}orpus}, # year = {2015}, # eprint = {1510.08484}, # note = {arXiv:1510.08484v1} # }

david-ryan-snyder · 2019-04-22T23:33:11Z

egs/swbd/s5c/local/chain/multi_style/run_tdnn_1a.sh

@@ -0,0 +1,256 @@
+#!/bin/bash


I think this script should go in into a new tuning directory, e.g., local/chain/multi_style/tuning/run_tdnn_1a.sh.

Then, in local/chain/multi_style, I suggest creating a symbolic link to run_tdnn_1a.sh called run_tdnn.sh. That would be analogous to what is done in other recipes, such as https://github.com./kaldi-asr/kaldi/tree/master/egs/swbd/s5c/local/chain .

Also, I suggest adding a more detailed comment at the top of this script that also includes some WER results. For example, see https://github.com./kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/chain/tuning/run_tdnn_lstm_1n.sh.

Makes sense but instead of creating the sym link in local/chain/multi_style it is better to create in local/chain/run_tdnn_multistyle.sh, since all the other tend scripts are located there. What do you think?

That sounds better.

I made the changes except for the inclusion of WERs. I will include them once the tdnn finishes training. In my old experiments I never decoded on train_dev.

danpovey · 2019-04-23T00:22:15Z

Thanks David for the detailed review!

…

On Mon, Apr 22, 2019 at 7:33 PM David Snyder ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In egs/swbd/s5c/local/chain/multi_style/run_tdnn_1a.sh <#3119 (comment)>: > @@ -0,0 +1,256 @@ +#!/bin/bash I think this script should go in into a new tuning directory, e.g., local/chain/multi_style/tuning/run_tdnn_1a.sh. Then, in local/chain/multi_style, I suggest creating a symbolic link to run_tdnn_1a.sh called run_tdnn.sh. That would be analogous to what is done in other recipes, such as https://github.com./kaldi-asr/kaldi/tree/master/egs/swbd/s5c/local/chain . Also, I suggest adding a more detailed comment at the top of this script that also includes some WER results. For example, see https://github.com./kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/chain/tuning/run_tdnn_lstm_1n.sh . — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#3119 (review)>, or mute the thread <https://github.com./notifications/unsubscribe-auth/AAZFLO5YKKZ6FOEORGSGVTTPRZDMNANCNFSM4G622LCA> .

danpovey · 2019-04-23T00:50:41Z

Yes, maybe having the symlink point from local/chain/ makes sense.

…

On Mon, Apr 22, 2019 at 8:48 PM phanisankar-nidadavolu < ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In egs/swbd/s5c/local/chain/multi_style/run_tdnn_1a.sh <#3119 (comment)>: > @@ -0,0 +1,256 @@ +#!/bin/bash Makes sense but instead of creating the sym link in local/chain/multi_style it is better to create in local/chain/run_tdnn_multistyle.sh, since all the other tend scripts are located there. What do you think? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#3119 (comment)>, or mute the thread <https://github.com./notifications/unsubscribe-auth/AAZFLOY6KYINDKAHMT2TRIDPRZMFFANCNFSM4G622LCA> .

david-ryan-snyder · 2019-04-25T18:12:47Z

I have two more comments on the aesthetics of the PR while you're waiting for the WERs.

I don't think we need the multi_style directory. This doesn't matter much for this particular PR, but as we start adding more scripts that use augmentation, deciding where these scripts will go will become more important. As I understand it, we plan on augmentation becoming part of the standard ASR recipe. I think it makes more sense to put theses scripts in the same place people are used to finding them, which is in chain/tuning.
My suggestion is to move the new script to local/chain/tuning/run_tdnn_multistyle_1a.sh and make local/chain/run_tdnn_multistyle.sh be a symbolic link to that.

Also, in my opinion, it's better to refer to these recipes using the word "augmentation" rather than "multi style." In my opinion, using some form of the word "augment" makes it immediately clear what you're doing, and then the terminology would be consistent with the new directory steps/data/augmentation that you're adding. Also, "multi style" sounds old fashioned to me. What do you think?

david-ryan-snyder · 2019-04-25T19:28:58Z

During the meeting today, @danpovey suggested that we use the word "aug" for the scripts in this recipe instead of "multi style" or "augmentation." Seems nice and concise, and everyone should know what that means at first glance, I think.

vimalmanohar · 2019-04-26T00:25:13Z

I think it is better to call it multi_condition or something to be consistent with existing recipes (in Aspire, AMI and Babel and probably many others) that already use that name for the same purpose. The scripts should go to local/multi_condition or local/chain/multi_condition/.

david-ryan-snyder · 2019-04-26T00:47:53Z

@vimalmanohar, my objection to this is because augmentation is now going to become a standard part of all or most of our ASR recipes. I think it's going to be cumbersome to propagate this terminology into the names of scripts for future recipes. If we need a way to distinguish between the past recipes without augmentation, and new recipes that use augmentation, adding the suffix "aug" to the new scripts is a concise (and I think, more intuitive) way to indicate what the change is.

If we have to go with multi_*, I agree multi_condition is better than multi_style since it already exists.

vimalmanohar · 2019-04-26T00:28:30Z

egs/swbd/s5c/local/copy_ali_dir.sh

+#!/bin/bash
+
+noise_list="reverb1:babble:music:noise"
+max_jobs_run=50


I think it is better to create a generic version of the script in steps like steps/copy_ali_dir.sh with option as --prefixes. It would be painful to keep copying this script to every new recipe.

vimalmanohar · 2019-04-26T00:29:57Z

egs/swbd/s5c/local/copy_lat_dir.sh

+. utils/parse_options.sh
+
+if [ $# -ne 3 ]; then
+  echo "Usage: $0 <out-data> <src-ali-dir> <out-ali-dir>"


Sem thing to this script. Also change ali-dir to lat-dir and give a description of what this script does.
seems like an inappropriate name as it is actually an input.

vimalmanohar · 2019-04-26T00:31:38Z

egs/swbd/s5c/local/copy_lat_dir.sh

+  cat $dir/lat_tmp.*.scp | awk -v p=$p '{print p$0}'
+done | sort -k1,1 > $dir/lat_out.scp.noise
+
+cat $dir/lat_tmp.*.scp | awk '{print $0}' | sort -k1,1 > $dir/lat_out.scp.clean


If clean data is also needed to be added, then add another option for this such as --include-original. This type of option is already used in one of the older scripts.

vimalmanohar · 2019-04-26T00:34:00Z

egs/swbd/s5c/local/nnet3/prepare_multistyle_data.sh

+. ./cmd.sh
+
+set -e
+stage=0


This script should be also be called local/nnet3/multi_condition/run_ivector_common.sh. It is same as the one that is in aspire, AMI and Babel but uses a new script for augmentation.

We have made use_ivectors an optional argument. I am not sure run_ivector_common.sh is a valid name in that case. We either have to remove use_ivectors flag and make training of ivectors a default option or name the script prepare_aug_data.sh. Let me know what you think.

Hm, maybe prepare_aug_data.sh is OK then. We may not end up using ivectors in all cases. I don't have super strong opinoins about that.

Maybe run_aug_common.sh?

vimalmanohar · 2019-04-26T00:37:13Z

egs/wsj/s5/steps/data/augment_data_dir.py

 from reverberate_data_dir import ParseFileToDict
 from reverberate_data_dir import WriteDictToFile
+import libs.common as common_lib
 data_lib = imp.load_source('dml', 'steps/data/data_dir_manipulation_lib.py')


I think if this script is being created new, it would be better to stick to the PEP8 standards, which includes using function names like get_args() instead of GetArgs().
But more importantly I think some of the things like imp.load_source may not work in python3. Or at least, it is very old style and should be modified to use import data_dir_manipulation_lib as data_lib.

vimalmanohar · 2019-04-26T00:39:03Z

egs/wsj/s5/steps/data/augment_data_dir.py

+        sys.exit()
+
+    fg_snrs = [int(i) for i in args.fg_snr_str.split(":")]
+    bg_snrs = [int(i) for i in args.bg_snr_str.split(":")]
    num_bg_noises = [int(i) for i in args.num_bg_noises.split(":")]
    reco2dur = ParseFileToDict(input_dir + "/reco2dur",


I think instead of assuming this exists, it is better to first call utils/data/get_reco2dur.sh. Otherwise this is going to be called once for each augmentation that will be done.

Hi Vimal, the wrapper script prepare_multistyle_data.sh (now changed to run_ivector_aug.sh) first creates reco2dur file in the input directory and then we call this script to augment. I guess it is safe to assume that this file exists (at least for this recipe). Do you want me to modify the script to first check whether the file exists or not, and create the file if it does not exist?

You can add it do this script. utils/data/get_reco2dur.sh already will check if file exists and create it only if not. So you can simply call that script.

vimalmanohar · 2019-04-26T00:40:46Z

egs/wsj/s5/steps/data/reverberate_data_dir.py

@@ -84,9 +84,6 @@ def GetArgs():
    return args

 def CheckArgs(args):
-    if not os.path.exists(args.output_dir):


If this is removed, then the script will fail if the output_dir exists.

@vimalmanohar I don't see why. This part of the code is only creating output dir if it does not exist. Anyways, I am modifying the code to fail if the output directory already exists.

I don't think Vimal was asking for it to fail if the output dir exists (and I don't recommend it either).
I think he's pointing out that since, below, you do os.makedirs with no such guard, it will fail if it did not
exist. It seems to have an exist_ok parameter which you can set to true.

The original version of the script will re-create the augmented data dir if the script is run again. But Phani modified it so it would do nothing if run again, which I think is unexpected behavior. It should either 1) re-create the augmented data dir, 2) fail with error saying it already exists.

vimalmanohar · 2019-04-26T00:42:30Z

egs/wsj/s5/steps/data/reverberate_data_dir.py

@@ -653,6 +658,12 @@ def Main():
                           pointsource_noise_addition_probability = args.pointsource_noise_addition_probability,
                           max_noises_per_minute = args.max_noises_per_minute)

+    else:
+        print("Directory {0} already exists, not creating it again".format(args.output_dir))


I don't think it should skip creating it if its already there. I think it should either recreate it or fail with an error saying it already exists.

danpovey · 2019-04-26T01:15:37Z

I think aug / augmentation is a better name because eventually we will likely include other augmentation methods in this setup. and multi_condition is a mouthful when used in directory names.

…

On Thu, Apr 25, 2019 at 9:13 PM Vimal Manohar ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In egs/swbd/s5c/local/copy_ali_dir.sh <#3119 (comment)>: > @@ -0,0 +1,60 @@ +#!/bin/bash + +noise_list="reverb1:babble:music:noise" +max_jobs_run=50 I think it is better to create a generic version of the script in steps like steps/copy_ali_dir.sh with option as --prefixes. It would be painful to keep copying this script to every new recipe. ------------------------------ In egs/swbd/s5c/local/copy_lat_dir.sh <#3119 (comment)>: > @@ -0,0 +1,60 @@ +#!/bin/bash + +noise_list="reverb1:babble:music:noise" +max_jobs_run=50 +nj=100 +cmd=queue.pl +write_compact=true + +. ./path.sh +. utils/parse_options.sh + +if [ $# -ne 3 ]; then + echo "Usage: $0 <out-data> <src-ali-dir> <out-ali-dir>" Sem thing to this script. Also change ali-dir to lat-dir and give a description of what this script does. seems like an inappropriate name as it is actually an input. ------------------------------ In egs/swbd/s5c/local/copy_lat_dir.sh <#3119 (comment)>: > +rm -f $dir/lat_tmp.*.{ark,scp} 2>/dev/null + +# Copy the alignments temporarily +echo "creating temporary lattices in $dir" +$cmd --max-jobs-run $max_jobs_run JOB=1:$num_jobs $dir/log/copy_lat_temp.JOB.log \ + lattice-copy --write-compact=$write_compact \ + "ark:gunzip -c $src_dir/lat.JOB.gz |" \ + ark,scp:$dir/lat_tmp.JOB.ark,$dir/lat_tmp.JOB.scp || exit 1 + +# Make copies of utterances for perturbed data +utt_prefixes=`echo $noise_list | awk -F ":" '{for (i=1; i<=NF; i++) printf "%s- ", $i}'` +for p in $utt_prefixes; do + cat $dir/lat_tmp.*.scp | awk -v p=$p '{print p$0}' +done | sort -k1,1 > $dir/lat_out.scp.noise + +cat $dir/lat_tmp.*.scp | awk '{print $0}' | sort -k1,1 > $dir/lat_out.scp.clean If clean data is also needed to be added, then add another option for this such as --include-original. This type of option is already used in one of the older scripts. ------------------------------ In egs/swbd/s5c/local/nnet3/prepare_multistyle_data.sh <#3119 (comment)>: > @@ -0,0 +1,225 @@ +#!/bin/bash +# Copyright 2019 Phani Sankar Nidadavolu +# Apache 2.0. + +. ./cmd.sh + +set -e +stage=0 This script should be also be called local/nnet3/multi_condition/run_ivector_common.sh. It is same as the one that is in aspire, AMI and Babel but uses a new script for augmentation. ------------------------------ In egs/wsj/s5/steps/data/augment_data_dir.py <#3119 (comment)>: > from reverberate_data_dir import ParseFileToDict from reverberate_data_dir import WriteDictToFile +import libs.common as common_lib data_lib = imp.load_source('dml', 'steps/data/data_dir_manipulation_lib.py') I think if this script is being created new, it would be better to stick to the PEP8 standards, which includes using function names like get_args() instead of GetArgs(). But more importantly I think some of the things like imp.load_source may not work in python3. Or at least, it is very old style and should be modified to use import data_dir_manipulation_lib as data_lib. ------------------------------ In egs/wsj/s5/steps/data/augment_data_dir.py <#3119 (comment)>: > input_dir = args.input_dir output_dir = args.output_dir + + if os.path.exists(output_dir): + data_lib.RunKaldiCommand("utils/fix_data_dir.sh {output_dir}".format( + output_dir = output_dir)) + print("directory {} already exists, not creating it again".format(output_dir)) + sys.exit() + + fg_snrs = [int(i) for i in args.fg_snr_str.split(":")] + bg_snrs = [int(i) for i in args.bg_snr_str.split(":")] num_bg_noises = [int(i) for i in args.num_bg_noises.split(":")] reco2dur = ParseFileToDict(input_dir + "/reco2dur", I think instead of assuming this exists, it is better to first call utils/data/get_reco2dur.sh. Otherwise this is going to be called once for each augmentation that will be done. ------------------------------ In egs/wsj/s5/steps/data/reverberate_data_dir.py <#3119 (comment)>: > @@ -84,9 +84,6 @@ def GetArgs(): return args def CheckArgs(args): - if not os.path.exists(args.output_dir): If this is removed, then the script will fail if the output_dir exists. ------------------------------ In egs/wsj/s5/steps/data/reverberate_data_dir.py <#3119 (comment)>: > @@ -653,6 +658,12 @@ def Main(): pointsource_noise_addition_probability = args.pointsource_noise_addition_probability, max_noises_per_minute = args.max_noises_per_minute) + else: + print("Directory {0} already exists, not creating it again".format(args.output_dir)) I don't think it should skip creating it if its already there. I think it should either recreate it or fail with an error saying it already exists. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3119 (review)>, or mute the thread <https://github.com./notifications/unsubscribe-auth/AAZFLOZMEDC3JJNVWWQ72LDPSJJMLANCNFSM4G622LCA> .

vimalmanohar · 2019-04-26T01:21:42Z

I guess it makes sense then to use augmentation if all the new recipes have it. But perhaps it is better to have the new recipes in local/chain/augmentation/tuning and then create a symlink in local/chain or local/chain/augmentation.

danpovey · 2019-04-26T01:24:30Z

I'd rather not add a new thing in the hierarchy if this is going to become the default. The idea is to have this be the normal recipe in local/chain/run_tdnn.sh. It might have to call a different run_ivector_common.sh script though.

…

On Thu, Apr 25, 2019 at 9:21 PM Vimal Manohar ***@***.***> wrote: I guess it makes sense then to use augmentation if all the new recipes have it. But perhaps it is better to have the new recipes in local/chain/augmentation/tuning and then create a symlink in local/chain or local/chain/augmentation. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3119 (comment)>, or mute the thread <https://github.com./notifications/unsubscribe-auth/AAZFLO5UU66QFRTD2DC2CFLPSJKLBANCNFSM4G622LCA> .

phanisankar-nidadavolu · 2019-05-01T17:53:02Z

Hmm. Got it. Phani

…

On Wed, May 1, 2019 at 1:41 PM Daniel Povey ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In egs/wsj/s5/steps/data/reverberate_data_dir.py <#3119 (comment)>: > @@ -84,9 +84,6 @@ def GetArgs(): return args def CheckArgs(args): - if not os.path.exists(args.output_dir): I don't think Vimal was asking for it to fail if the output dir exists (and I don't recommend it either). I think he's pointing out that since, below, you do os.makedirs with no such guard, it will fail if it did not exist. It seems to have an exist_ok parameter which you can set to true. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3119 (comment)>, or mute the thread <https://github.com./notifications/unsubscribe-auth/AH3QA2W4KNLOAIXGNQ26X6LPTHI33ANCNFSM4G622LCA> .

phanisankar-nidadavolu · 2019-05-01T17:53:43Z

run_aug_common.sh is good. Phani

…

On Wed, May 1, 2019 at 1:30 PM Daniel Povey ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In egs/swbd/s5c/local/nnet3/prepare_multistyle_data.sh <#3119 (comment)>: > @@ -0,0 +1,225 @@ +#!/bin/bash +# Copyright 2019 Phani Sankar Nidadavolu +# Apache 2.0. + +. ./cmd.sh + +set -e +stage=0 Maybe run_aug_common.sh? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3119 (comment)>, or mute the thread <https://github.com./notifications/unsubscribe-auth/AH3QA2UPDFCTBGRNIRCLFYLPTHHURANCNFSM4G622LCA> .

…gested

phanisankar-nidadavolu · 2019-05-06T18:53:50Z

Hello, the PR looks good with changes that you suggested. Main changes made: renaming to aug, moving around files, adding results and some other stuff.

Fixes typo in #3119

… fix for #3119 (#3334)

… for ASR and speaker ID (kaldi-asr#3119) Now multi-style training with noise and reverberation is an option (instead of speed augmentation). Multi-style training seems to be more robust to unseen/noisy conditions.

Fixes typo in kaldi-asr#3119

… fix for kaldi-asr#3119 (kaldi-asr#3334)

@spencerkirn

* [egs] New chime-5 recipe (kaldi-asr#2893) * [scripts,egs] Made changes to the augmentation script to make it work for ASR and speaker ID (kaldi-asr#3119) Now multi-style training with noise and reverberation is an option (instead of speed augmentation). Multi-style training seems to be more robust to unseen/noisy conditions. * [egs] updated local/musan.sh to steps/data/make_musan.sh in speaker id scripts (kaldi-asr#3320) * [src] Fix sample rounding errors in extract-segments (kaldi-asr#3321) With a segments file constructed from exact wave file durations some segments came out one sample short. The reason is the multiplication of the float sample frequency and double audio time point is inexact. For example, float 8000.0 multiplied by double 2.03 yields 16239.99999999999, one LSB short of the correct sample number 16240. Also changed all endpoint calculations so that they performed in seconds, not sample numbers, as this does not require a conversion in nearly every comparison, and report positions in diagnostic messages also in seconds, not sample numbers. * [src,scripts]Store frame_shift, utt2{dur,num_frames}, .conf with features (kaldi-asr#3316) Generate utt2dur and utt2num_frames during feature extraction, and store frame period in frame_shift file in feature directory. Copy relevant .conf files used in feature extraction into the conf/ subdirectory with features. Add missing validations and options in some extraction scripts. * [build] Initial version of Docker images for (CPU and GPU versions) (kaldi-asr#3322) * [scripts] fix typo/bug in make_musan.py (kaldi-asr#3327) * [scripts] Fixed misnamed variable in data/make_musan.py (kaldi-asr#3324) * [scripts] Trust frame_shift and utt2num_frames if found (kaldi-asr#3313) Getting utt2dur involves accessing wave files, and potentially running full pipelines in wav.scp, which may take hours for a large data set. If utt2num_frames exists, use it instead if frame rate is known. Issue: kaldi-asr#3303 Fixes: kaldi-asr#3297 "cat: broken pipe" * [scripts] typo fix in augmentation script (kaldi-asr#3329) Fixes typo in kaldi-asr#3119 * [scripts] handle frame_shit and utt2num_frames in utils/ (kaldi-asr#3323) subset_data_dir.sh has been refactored thoroughly so that its logic can be followed easier. It has been well tested and dogfooded. All changes here are necessary to subset, combine and verify utt2num_frames, and copy frame_shift to new directories where necessary. * [scripts] Extend combine_ali_dirs.sh to combine alignment lattices (kaldi-asr#3315) Relevant discussion: https://groups.google.com/forum/#!topic/kaldi-help/2uxfByEAmfw * [src] Fix rare case when segment end rounding overshoots file end in extract-segments (kaldi-asr#3331) * [scripts] Change --modify-spk-id default to False; back-compatibility fix for kaldi-asr#3119 (kaldi-asr#3334) * [build] Add easier configure option in failure message of configure (kaldi-asr#3335) * [scripts,minor] Fix typo in comment (kaldi-asr#3338) * [src,egs] Add option for applying SVD on trained models (kaldi-asr#3272) * [src] Add interfaces to nnet-batch-compute that expects device input. (kaldi-asr#3311) This avoids a ping pong of memory to host. Implementation now assumes device memory. interfaces will allocate device memory and copy to it if data starts on host. Add a cuda matrix copy function which clamps rows. This is much faster than copying one row at a time and the kernel can handle the clamping for free. * [build] Update GCC support check for CUDA toolkit 10.1 (kaldi-asr#3345) * [egs] Fix to aishell1 v1 download script (kaldi-asr#3344) * [scripts] Support utf-8 files in some scripts (kaldi-asr#3346) * [src] Fix potential underflow bug in MFCC, RE energy floor, thx: Zoltan Tobler (kaldi-asr#3347) * [scripts]: add warning to nnet3/chain/train.py about ineffective options (kaldi-asr#3341) * [scripts] Fix regarding UTF handling in cleanup script (kaldi-asr#3352) * [scripts] Change encoding to utf-8 in data augmentation scripts (kaldi-asr#3360) * [src] Add CUDA accelerated MFCC computation. (kaldi-asr#3348) * Add CUDA accelerated MFCC computation. Creates a new directory 'cudafeat' for placing cuda feature extraction components as it is developed. Added a directory 'cudafeatbin' for placing binaries that are cuda accelerated that mirrior binaries elsewhere. This commit implements: feature-window-cuda.h/cu which implements a feature window on the device by copying it from a host feature window. feature-mfcc-cuda.h/cu which implements the cuda mfcc feature extractor. compute-mfcc-feats-cuda.cc which mirriors compute-mfcc-feats.cc There were also minor changes to other files. * Only build cuda binaries if cuda is enabled * [src] Optimizations for batch nnet3. The issue fixed here is that (kaldi-asr#3351) small cuda memory copies are inefficeint because each copy can add multiple micro-seconds of latency. The code as written would copy a small matrices or vectors to and from the tasks one after another. To avoid this i've implemented a batched matrix copy routine. This takes arrays of matrix descriptions for the input and output and batches the copies in a single kernel call. This is used in both FormatInputs and FormatOutputs to reduce launch latency overhead. The kernel for the batched copy uses a trick to avoid a memory copy of the host paramters. The parameters are put into a struct containing a static sized array. These parameters are then marshalled like normal cuda parameters. This avoids additional launch latency overhead. There is still more work to do at the beginning and end of nnet3. In particular we may want to batch the clamped memory copies and the large number of D2D copies at the end. I haven't fully tracked those down and may return to them in the future. * [scripts,minor] Remove outdated comment (kaldi-asr#3361) * [egs] A kaldi recipe based on the corpus named "aidatatang_200zh". (kaldi-asr#3326) * [src] nnet1: changing end-rule in 'nnet-train-multistream', (kaldi-asr#3358) - end the training when there is no more data to refill one of the streams, - this avoids overtraining to the 'last' utterance, * [scripts] Fix how the empty (faulty?) segments are handled in data-cleanup code (kaldi-asr#3337) * [src] Fix to bug in ivector extraction causing assert failure, thx: sray (kaldi-asr#3364) * [src] Fix to bug in ivector extraction causing assert failure, thx: sray (kaldi-asr#3365) * [scripts] add script to compute dev PPL on kaldi-rnnlm (kaldi-asr#3340) * [scripts,egs] Small fixes to diarization scripts (kaldi-asr#3366) * [egs] Modify split_scp.pl usage to match its updated code (kaldi-asr#3371) * [src] Fix non-cuda `make depend` build by putting compile guards around header. (kaldi-asr#3374) * [build] Docker docs update and minor changes to the Docker files (kaldi-asr#3377) * [egs] Scripts for MATERIAL ASR (kaldi-asr#2165) * [src] Batch nnet3 optimizations. Batch some of the copies in and copies out (kaldi-asr#3378) * [build] Widen cuda guard in cudafeat makefile. (kaldi-asr#3379) * [scripts] nnet1: updating the scripts to support 'online-cmvn', (kaldi-asr#3383) * [build,src] Enhancements to the cudamatrix/cudavector classes. (kaldi-asr#3373) * Added CuSolver to the matrix class. This is only supported with Cuda 9.1 or newer. Calling CuSolver code without Cuda 9.1 or newer will result in a runtime error. This change required some changes to the build system which requires versioning the configure script. This forces everyone to reconfigure. Failure to reconfigure would result in linking and build errors on some systems. * [egs] Fix perl `use encoding` deprecation (kaldi-asr#3386) * [scripts] Add max_active to align_fmllr_lats.sh to prevent rare crashes (kaldi-asr#3387) * [src] Implemented CUDA acclerated online cmvn. (kaldi-asr#3370) This patch is part of a larger effort to implement the entire online feature pipeline in CUDA so that wav data is transfered to the device and never copied back to the host. This patch includes a new binary cudafeatbin/apply-cmvn-online.cc which for the most part matches online2bin/apply-cmvn-online. This binary is primarily for correctness testing and debugging as it makes no effort to compute multiple features in parallel on the device. The CUDA performance is dominiated by the cost of copying the feature to and from the device. While there is a small speedup I do not expect this binary to be used in production. Instead users will use the upcomming online-pipeline which will take features directly from the mfcc computation on the device and pass results to the next part of the pipeline. Summary of changes: Makefile: Added online2 dependencies to cudafeat, cudafeatbin, cudadecoder, and cudadecoderbin. cudafeat\: Makefile: added online2 dependency, added new .cu/.h files feature-online-cmvn-cuda.cu/h: implements online-cmvn in cuda. cudafeatbin\: Makefile: added new binary, added online2 dependency apply-cmvn-online-cuda.cc: binary which mimics online2bin/apply-cmvn-online Correctness testing: The correctness was tested by generating set of 20000 features and then running the CPU binary and GPU binary and comparing results using featbin/compare-feats. ../online2bin/apply-cmvn-online /workspace/models/LibriSpeech/ivector_extractor/global_cmvn.stats "scp:mfcc.scp" "ark,scp:cmvn.ark,cmvn.scp" ./apply-cmvn-online-cuda /workspace/models/LibriSpeech/ivector_extractor/global_cmvn.stats "scp:mfcc.scp" "ark,scp:cmvn-cuda.ark,cmvn-cuda.scp" ../featbin/compare-feats ark:cmvn-cuda.ark ark:cmvn.ark LOG (compare-feats[5.5.1301~3-17818]:main():compare-feats.cc:105) self-product of 1st features for each column dimension: [ 5.52221e+09 9.1134e+09 5.92818e+09 7.42173e+09 7.48633e+09 7.21316e+09 6.9515e+09 7.03883e+09 6.40267e+09 5.83088e+09 5.01438e+09 5.1575e+09 4.28688e+09 3.529e+09 3.12182e+09 2.28721e+09 1.76343e+09 1.35117e+09 8.72517e+08 5.31836e+08 2.65112e+08 9.20308e+07 1.24084e+07 3.56008e+06 4.25283e+07 1.09786e+08 1.88937e+08 2.60207e+08 3.23115e+08 3.56371e+08 3.69035e+08 3.65216e+08 3.89125e+08 4.07064e+08 3.40407e+08 2.65444e+08 2.50244e+08 2.05726e+08 1.60606e+08 1.07217e+08 ] LOG (compare-feats[5.5.1301~3-17818]:main():compare-feats.cc:106) self-product of 2nd features for each column dimension: [ 5.5223e+09 9.11355e+09 5.92812e+09 7.4218e+09 7.48666e+09 7.21338e+09 6.95174e+09 7.03895e+09 6.40254e+09 5.83113e+09 5.01411e+09 5.15774e+09 4.28692e+09 3.52918e+09 3.122e+09 2.28693e+09 1.76326e+09 1.3513e+09 8.72521e+08 5.31802e+08 2.65137e+08 9.20296e+07 1.2408e+07 3.5604e+06 4.25301e+07 1.09793e+08 1.88933e+08 2.60217e+08 3.23124e+08 3.56371e+08 3.69007e+08 3.65176e+08 3.89104e+08 4.07067e+08 3.40416e+08 2.65498e+08 2.50196e+08 2.057e+08 1.60612e+08 1.07192e+08 ] LOG (compare-feats[5.5.1301~3-17818]:main():compare-feats.cc:107) cross-product for each column dimension: [ 5.52209e+09 9.11229e+09 5.92538e+09 7.41665e+09 7.47877e+09 7.20269e+09 6.93785e+09 7.02284e+09 6.38411e+09 5.81143e+09 4.99389e+09 5.13753e+09 4.26792e+09 3.51154e+09 3.10676e+09 2.27436e+09 1.75322e+09 1.34367e+09 8.67367e+08 5.28672e+08 2.63516e+08 9.14194e+07 1.23215e+07 3.53409e+06 4.21905e+07 1.08872e+08 1.87238e+08 2.57779e+08 3.19827e+08 3.5252e+08 3.64691e+08 3.60529e+08 3.84482e+08 4.02396e+08 3.36136e+08 2.61631e+08 2.46931e+08 2.03079e+08 1.5856e+08 1.05738e+08 ] LOG (compare-feats[5.5.1301~3-17818]:main():compare-feats.cc:111) Similarity metric for each dimension [ 0.99997 0.999871 0.999532 0.999311 0.998968 0.998533 0.998019 0.997719 0.997111 0.996644 0.995941 0.996104 0.995572 0.995028 0.995147 0.994445 0.994258 0.994402 0.994095 0.994084 0.993934 0.993363 0.993015 0.992655 0.992037 0.991645 0.991017 0.990649 0.98981 0.989195 0.988267 0.987222 0.988093 0.98853 0.987442 0.985534 0.986858 0.987196 0.987242 0.986318 ] (1.0 means identical, the smaller the more different) LOG (compare-feats[5.5.1301~3-17818]:main():compare-feats.cc:116) Overall similarity for the two feats is:0.993119 (1.0 means identical, the smaller the more different) LOG (compare-feats[5.5.1301~3-17818]:main():compare-feats.cc:119) Processed 20960 feature files, 0 had errors. LOG (compare-feats[5.5.1301~3-17818]:main():compare-feats.cc:126) Features are considered similar since 0.993119 >= 0.99 * [egs] Fixed file path RE augmentation, in aspire recipe (kaldi-asr#3388) * [scripts] Update taint_ctm_edits.py, RE utf-8 encoding (kaldi-asr#3392) * [src] Change nnet3-am-copy to allow more manipulations (kaldi-asr#3393) * [egs] Remove confusing setting of overridden num_epochs variable in aspire (kaldi-asr#3394) * [build] Add a missing dependency for "decoder" in Makefile (kaldi-asr#3397) * [src] CUDA decoder performance patch (kaldi-asr#3391) * [build,scripts] Dependency fix; add cross-references to scripts (kaldi-asr#3400) * [egs] Fix cleanup-after-partial-download bug in aishell (kaldi-asr#3404) * [src] Change functions like AppiyLog() to all work out-of-place (kaldi-asr#3185) * [src] Make stack trace display more user friendly (kaldi-asr#3406) * [egs] Fix to separators in Aspire reverb recipe (kaldi-asr#3408) * [egs] Fix to separators in Aspire, related to kaldi-asr#3408 (kaldi-asr#3409) * [src] online2-tcp, add option to display start/end times (kaldi-asr#3399) * [src] Remove debugging assert in cuda feature extraction code (kaldi-asr#3411) * [scripts] Fix to checks in adjust_unk_graph.sh (kaldi-asr#3410) bash test `-f` does not work for `phones/` which is a directory. Changed it to `-e`. * [src] Added GPU feature extraction (will improve speed of GPU decoding) (kaldi-asr#3390) Currently only supports MFCC features. * [src] Fix build error introducted by race condition in PR requests/accepts. (kaldi-asr#3412) * [src] Added error string to CUDA allocation errors. (kaldi-asr#3413) * [src] Fix CUDA_VSERION number in preprocessor checks (kaldi-asr#3414) * [src] Fix build of online feature extraction with older CUDA version (kaldi-asr#3415) * [src] Update Insert function of hashlist and decoders (kaldi-asr#3402) makes interface of HashList more standard; slight speed improvement. * [src] Fix spelling mistake in kaldi-asr#3415 (kaldi-asr#3416) * [build] Fix configure bug RE CuSolver (kaldi-asr#3417) * [src] Enable an option to use the GPU for feature extraction in GPU decoding (kaldi-asr#3420) This is turned on by using the option --gpu-feature-extract=true. By default this is on. We provie the option to turn it off because in situations where CPU resources are unconfined you can get slightly higher performance with CPU feature extraction but in most cases GPU feature extraction is faster and has more stable performance. In addition a user may wish to turn it off to support models where feature extraction is currently incomplete (e.g. FBANK, PLP, PITCH, etc). We will add those features in the future but for now a user wanted to decode those models should place feature extraction on the host. * [egs] Replace $cuda_cmd with $train_cmd for FarsDat (kaldi-asr#3426) * [src] Remove outdated comment (kaldi-asr#3148) (kaldi-asr#3422) * [src] Adding missing thread.join in CUDA decoder and fixing two todos (kaldi-asr#3428) * [build] Add missing lib dependency in cudafeatbin (kaldi-asr#3427) * [egs] Small fix to aspire run_tdnn_7b.sh (kaldi-asr#3429) * [build] Fix to cuda makefiles, thanks: [email protected] (kaldi-asr#3431) * [build] Add missing deps to cuda makefiles, thanks: [email protected] (kaldi-asr#3432) * [egs] Fix encoding issues in Chinese ASR recipe (kaldi-asr#3430) (kaldi-asr#3434) * Revert "[src] Update Insert function of hashlist and decoders (kaldi-asr#3402)" (kaldi-asr#3436) This reverts commit 5cc7ce0. * [src] Update Insert function of hashlist and decoders (kaldi-asr#3402) (kaldi-asr#3438) makes interface of HashList more standard; slight speed improvement. Fixed version of kaldi-asr#3402 * [build] Fix the cross-compiling issue for Android under MacOS (kaldi-asr#3435) * [src] Marking operator as __host__ __device__ to avoid build issues (kaldi-asr#3441) avoids cudafeat build failures with some CUDA toolkit versions * [egs] Fix perl encoding bug (was causing crashes) (kaldi-asr#3442) * [src] Cuda decoder fixes, efficiency improvements (kaldi-asr#3443) * [scripts] Fix shebang of taint_ctm_edits.py to invoke python3 directly (kaldi-asr#3445) * [src] Fix to a check in nnet-compute code (kaldi-asr#3447) * [src,scripts] Various typo fixes and stylistic fixes (kaldi-asr#3153) * [scripts] Scripts for VB (variational bayes) resegmentation for Callhome diarization (kaldi-asr#3305) This refines the segment boundaries. Based on code originally by Lukas Burget from Brno. * [scripts] Extend utils/data/subsegment_data_dir.sh to copy reco2dur (kaldi-asr#3452) * [src,scripts,egs] Add code and example for SpecAugment in nnet3 (kaldi-asr#3449) * [scripts] Make segment_long_utterance honor frame_shift (kaldi-asr#3455) * [scripts] Fix to steps/nnet/train.sh (nnet1) w.r.t. incorrect bash test expressions (kaldi-asr#3456) * [egs] fixed bug in egs/gale_arabic/s5c/local/prepare_dict_subword.sh that it may delete words matching '<*>' (kaldi-asr#3465) * [src,build] Small fixes (kaldi-asr#3472) * [src] fixed warning: moving a temporary object prevents copy elision * [scripts] tools/extras/check_dependencies.sh look for alternative MKL locations via MKL_ROOT environment variable * [src] Fixed compilation error when DEBUG is defined * [egs] Add MGB-2 Arabic recipe (kaldi-asr#3333) * [scripts] Check/fix utt2num_frames when fixing data dir. (kaldi-asr#3482) * [src] A couple small bug fixes. (kaldi-asr#3477) * [src] A couple small bug fixes. * [src] Fix RE pdf-class 0, which is valid in pre-kaldi10 * [src,scripts] Cosmetic,file-mode fixes; fix to nnet1 align.sh introduced in kaldi-asr#3383 (kaldi-asr#3487) * [src] Small cosmetic and file-mode fixes * [src] Fix bug in nnet1 align.sh introduced in kaldi-asr#3383 * [egs] Add missing script in MGB2 recipe (kaldi-asr#3491) * [egs] Fixing nnet1 but introduced in kaldi-asr#3383 (rel. to kaldi-asr#3487) (kaldi-asr#3494) * [src] Fix for nnet3 bug encountered when implementing deltas. (kaldi-asr#3495) * [scripts,egs] Replace LDA layer with delta and delta-delta features (kaldi-asr#3490) WER is about the same but this is simpler to implement and more standard. * [egs] Add updated tdnn recipe for AMI (kaldi-asr#3497) * [egs] Create MALACH recipe based on s5b for AMI (kaldi-asr#3496) * [scripts] add --phone-symbol-table to prepare_lang_subword.sh (kaldi-asr#3485) * [scripts] Option to prevent crash when adapting on much smaller data (kaldi-asr#3506) * [build,scripts] Make OpenBLAS install check for gfortran; documentation fix (kaldi-asr#3507) * [egs] Update chain TDNN-F recipe for CHIME s5 to match s5b, improves results (kaldi-asr#3505) * [egs] Fix to kaldi-asr#3505: updating chime5 TDNN-F script (kaldi-asr#3508) * [scripts] Fixed issue that leads to empty segment file (kaldi-asr#3510) Fixed issue that leads to empty segment file on SSD disk (more detail: https://groups.google.com/forum/#!topic/kaldi-help/Ij3lQLCinN8) * [egs] Fix bug in AMI s5b RE overlapping segments that causes Fixed overlap segment bug (kaldi-asr#3503) * [egs] Small cosmetic change in extend_vocab_demo.sh (kaldi-asr#3516) * [src] Cosmetic changes; fix windows-compile bug reported by @spencerkirn (kaldi-asr#3515) * [src] Move cuda gpu from nnetbin to nnet3bin. (kaldi-asr#3513) * fix a bug in egs/voxceleb/v1/local/make_voxceleb1_v2.pl when preparing the file data/voxceleb1_test/trials (kaldi-asr#3512) * [egs] Fixed some bugs in mgb_data_prep.sh of mgb2_arabic (kaldi-asr#3501) * [src,scripts] fix various typos and errors in comments (kaldi-asr#3454) * [src] Move cuda-compiled to nnet3bin (kaldi-asr#3517) * [src] Fix binary name in Makefile, RE cuda-compiled (kaldi-asr#3518) * [src] buffer fix in cudafeat (kaldi-asr#3521) * [src] Hopefully make it possible to use empty FST in grammar-fst (kaldi-asr#3523) * [src] Add option to convert pdf-posteriors to phones (kaldi-asr#3526) * [src] Fix GetDeltaWeights for long-running online decoding (kaldi-asr#3528) * Fix GetDeltaWeights for long-running online decoding. Use frame count relative to decoder start internally in silence weighting and update to the frame count relative to pipeline only once result is calculated. * Add a note about transition to a new function API * [src] Small fix to post-to-phone-post.cc (problem introduced in from kaldi-asr#3526) (kaldi-asr#3534) * [src]: adding Dan's fix to a bug in nnet-computation-graph (kaldi-asr#3531) * [egs] Replace prep_test_aspire_segmentation.sh (kaldi-asr#2943) (kaldi-asr#3530) * [egs] OCR: Decomposition for CASIA and YOMDLE_ZH datasets (kaldi-asr#3527) * [build] check_dependencies.sh: correct installation command for fedora (kaldi-asr#3539) * [src,doc] Fix bug in new option of post-to-phone-post; skeleton of faq page (kaldi-asr#3540) * [egs,scripts] Adding possibility to have 'online-cmn' on input of 'nnet3' models (kaldi-asr#3498) * [scripts] Fix to build_tree_multiple_sources.sh (kaldi-asr#3545) * [doc] Fix accidental overwriting of kws page (kaldi-asr#3541) * [egs] Fix regex filter in Fisher preprocessing (was excluding 2-letter sentences like "um") (kaldi-asr#3548) * [scripts] Fix to bug introduced in kaldi-asr#3498 RE ivector-extractor options mismatch. (kaldi-asr#3549) * [scripts] Fix awk compatibility issue; be more careful about online_cmvn file (kaldi-asr#3550) * [src] Add a method for backward-compatibility with previous API (kaldi-asr#3536) * [src] Feature bank feature extraction using CUDA (kaldi-asr#3544) Following this change, both MFCC and fbank run through a single code path with parameters (use_power, use_log_fbank and use_dct) controlling the flow. CudaMfcc has been renamed to CudaSpectralFeatures. It contains an MfccOptions structure which contains FrameOptions and MelOptions. It can be initialized either with an MfccOptions object or an FbankOptions object. Compared with CudaMfccOptions, CudaSpectralOptions also contains these parameters use_dct - switches on the discrete cosine and lifter use_log_fbank - takes the log of the MEL banks values use_power - uses power in place of abs(amplitude) Each of these is defaulted on for MFCC. For fbank, use_dct is set to false. The others are set by user parameters. Also added a unit test for CUDA Fbank (cudafeatbin/compute-fbank-feats-cuda). * [src] Fix missing semicolon (kaldi-asr#3551) * [src] fix a typo mistaking not equal for assign sign in CUDA feature pipeline (kaldi-asr#3552) * [src] Fix issue kaldi-asr#3401 (crash in ivector extraction with max-remembered-frames+silence-weight) (kaldi-asr#3405) * [egs] Librispeech: in RESULTS, change best_wer.sh => utils/best_wer.sh (kaldi-asr#3553) * [egs] semisupervised recipes: fixing some variables in comments (kaldi-asr#3547) * [scripts] fix utils/lang/extend_lang.sh to add nonterm symbols in align_lexicon.txt (kaldi-asr#3556) * [scripts] Fix to bug in steps/data/data_dir_manipulation_lib.py (kaldi-asr#3174) * [src] Fix in nnet3-attention-component.cc, RE required context (kaldi-asr#3563) * [src] Temporarily turn off some model-collapsing code while investigating a bug (kaldi-asr#3565) * [scripts] Fix to data cleanup script RE utf-8 support in uttearnce * [src,scripts,egs] online-cmvn for online2 with chain models, (kaldi-asr#3560) * Add OnlineCMVN to Online NNET2 pipeline. This is used in some models (e.g. CVTE). This is optional and off by default. It applies CMVN before applying pitch. This code is essentially copied out of "online2/online-feature-pipeline.cc/h". Patch set provided by Levi Barnes. * online2: bugfix of config script, include ivector_period into the iextractor config file. * online-cmvn in online-nnet2-feature-pipeline, - update the C++ code from @luitjens, - introduced `OnlineFeatureInterface *nnet3_feature_` to explictly mark features that are passed to nnet3 model - added the transfer of OnlineCmvnState across utterances from same speaker - update the 'prepare_online_decoding.sh' to support online-cmvn - enabled OnlineCmvnStats transfer in training/decoding * OnlineNnet2FeaturePipeline, removing unused constructor, updating results * [build,src] Change to error message; update kaldi_lm install script (and kaldi_lm) to fix compile issue (kaldi-asr#3581) * [src] Clarify something in plda.h (cosmetic change) (kaldi-asr#3588) * [src] Small fix to online ivector extraction (RE kaldi-asr#3401/kaldi-asr#3405), thanks: Vladmir Vassiliev. (kaldi-asr#3592) Stops an assert failure by changing the assert condition * [egs] Add recipe for Chinese training with multiple databases (kaldi-asr#3555) * [src] Remove duplicate `num_done++` in apply-cmvn-online.cc (kaldi-asr#3597) * [src] Fix to CMVN with CUDA (kaldi-asr#3593) * * __restrict__ not __restrict__ * Fixing a complaint during Windows compilation * Plumbing for CUDA CMVN * Removing detritus. Removing comments, an unused variable and NULL'ing some object pointers (just in case) * [src] Fix two bugs in batched-wav-nnet3-cuda binary. (kaldi-asr#3600) 1) Using the "Any" apis prior to finishing submission of the full group could lead to a group finishing early. This would cause output to appear repeatedly. I have not seen this occur but an audit revealed it as an issue. The fix is to use the "Any" APIs only when full groups have been submitted. 2) GetNumberOfTasksPending() can return zero even though groups have not been waited for. This API call should not be used to determine if all groups have been completed as the number of pending tasks is independent of the number of groups remaining. * [scripts] propagated bad whitespace fix to validate_dict_dir.pl (cosmetic change) (kaldi-asr#3601) * [scripts] bug-fix on subset_data_dir.sh with --per-spk option (kaldi-asr#3567) (kaldi-asr#3572) The script failed to return the number of utterances `per-spk` requested, despite being available in the original spk2utt file. The stop-boundary on the awk's for-loop was off by 1. > An example of how affects the spk2utt files ```$ cat tmpData/spk2utt spk1 spk1-utt1 spk1-utt2 spk1-utt3 spk2 spk2-utt1 spk3 spk3-utt1 spk3-utt2 spk4 spk4-utt1 spk4-utt2 spk4-utt3 spk4-utt4 $ $ ./subset_data_dir.sh --per-spk tmpData 2 tmpData-before $ cat tmpData-before/spk2utt spk1 spk1-utt1 spk2 spk2-utt1 spk3 spk3-utt1 spk4 spk4-utt1 spk4-utt3 $ $ ./subset_data_dir_MOD.sh --per-spk tmpData 2 tmpData-after $ cat tmpData-after/spk2utt spk1 spk1-utt1 spk1-utt2 spk2 spk2-utt1 spk3 spk3-utt1 spk3-utt2 spk4 spk4-utt1 spk4-utt3 ``` * [src] Code changes to support GCC9 + OpenFST1.7.3 + C++2a (namespace issues) (kaldi-asr#3570) * [scripts] Training ivector-extractor: make online-cmvn per speaker. (kaldi-asr#3615) * [src] cached compiler I/O for nnet3-xvector-compute (kaldi-asr#3197) * [src] Fixed online2-tcp-nnet3-decode-faster.cc poll_ret checks (kaldi-asr#3611) * [scripts] Call utils/data/get_utt2dur.sh using the correct $cmd and $nj (kaldi-asr#3608) * [scripts] Enable tighter control of downloaded dependencies (kaldi-asr#3543) (kaldi-asr#3573) * [scripts] Make reverberate_data_dir.py handle vad.scp (kaldi-asr#3619) * [scripts] Don't get utt2dur in librispeech.. will be made by make_mfcc.sh now (kaldi-asr#3610) (kaldi-asr#3620) * [scripts] Make combine_ali_dirs.sh work when queue.pl is used (kaldi-asr#3537) (kaldi-asr#3561) * [egs] Fix duplicate removal of unk from Librispeech decoding graphs (kaldi-asr#3476) (kaldi-asr#3621) * [build] Check for gfortran, needed by OpenBLAS (for lapack) (kaldi-asr#3622) * [scripts] VB resegmentation: load MFCCs only when used (save memory) (kaldi-asr#3612) * [build] removed old Docker files - see docker in the root folder for the latest files (kaldi-asr#3558) * [src] Fix to compute-mfcc-feats.cc (thanks: @puneetbawa) (kaldi-asr#3623) * [egs] Update AMI tdnn recipe to reflect what was really run, and add hires feats. (kaldi-asr#3578) * [egs] Fix to run_ivector_common.sh in swbd (crash when running from stage 3) (kaldi-asr#3631) * [scripts] Make data augmentation work with UTF-8 utterance ids/filenames (kaldi-asr#3633) * [src] fix a bug in src/online/online-faster-decoder.cc (prevent segfault with some models) (kaldi-asr#3634) * [scripts] Make extend_lang.sh support lexiconp_silprob.txt (kaldi-asr#3339) (kaldi-asr#3632) * [scripts] Fix typo in analyze_alignments.sh (kaldi-asr#3635) * [scripts] Change the GPU policy of align.sh to wait instead of yes (kaldi-asr#3636) This is needed to avoid failures when using more number jobs than number of GPUs. Previously when --use-gpu=yes, when all GPUs are occupied, the job will error out in 20s. Changing to the policy to wait will have correct behavior. * [egs] Added tri3b and chain training for Aurora4 (kaldi-asr#3638) * [build] fixed broken Docker builds by adding gfortran package (kaldi-asr#3640) * [src] Fix bug in resampling checks for online features (kaldi-asr#3639) The --allow-upsample and --allow-downsample options were not handled correctly in the code that handles resampling for computing online features. * [build] Bump OpenBLAS version to 0.3.7 and enable locking (kaldi-asr#3642) Bumps OpenBLAS version to 0.3.7 for improved performance with Zen architecture. Also sets USE_LOCKING which fixes issue when calling a single-threaded OpenBLAS from several threads in parallel * [scripts] Update nnet3_to_dot.py to ignore Scale descriptors (kaldi-asr#3644) * [src] Speed optimization for decoders (call templates properly) (kaldi-asr#3637) * [doc] update FAQ page to include some FAQ candidates from Kaldi mailing lists (kaldi-asr#3646) * [egs] Small fix: duplicate xent settings from examples (kaldi-asr#3649) * [doc] Fix typo (kaldi-asr#3648) * [egs] Fix a bug in Tedlium run_ivector_common.sh (kaldi-asr#3647) Running perturb_data_dir_speed_3way.sh after making MFCC would cause an error, saying "feats.scp already exists". It is also uncessary to run it twice. * [src] Fix to matrix-lib-test to ignore small difference (kaldi-asr#3650) * [doc] update FAQ page: (kaldi-asr#3651) 1. Added some new FAQ candiates from mailing lists. 2. Added section for logo, version and books recommendation for beginners. * [scripts] Modify split_data.sh to split data evenly when utt2dur exists (kaldi-asr#3653) * [doc] update FAQ page: added section for free dataset, python wrapper, etc. (kaldi-asr#3652) * [src] Some CUDA i-vector fixes (kaldi-asr#3660) no longer assume we are starting at frame 0. This is needed for eventual online decoding. src/cudafeat/online-ivector-feature-cuda.cc: Added code to use LU-decomposition for debugging. Cholesky's is still used and works fine but this is a good test if something wrong is suspected in the solver. Added support for older toolkits but with a less efficient algorithm. * [src] CUDA batched decoder pipeline fixes (kaldi-asr#3659) bug fix: don't assume stride and number of colums is the same when packing matrix into a vector. src/cudadecoder/batched-threaded-nnet3-cuda-pipeline.cc: added additional sanity checking to better report errors in the field. no longer pinning memory for copying waves down. This was causing consistency issues. It is unclear why the code is not working and will continue to evaluate. This optimization doesn't add a lot of perf so we are disabling it for now. general cleanup fixed bug with tasks array being possibly being resized before being read. src/cudadecoderbin/batched-wav-nnet3-cuda.cc: Now outputting every iteration as a different lattice. This way we can score every lattice and better ensure correctness of binary. clang-format (removing tabs) * [egs] Small fix to aspire example: pass in num_data_reps (kaldi-asr#3665) * [doc] Update FAQ page (kaldi-asr#3663) 1. Adde section of Android for Kaldi 2. list some useful scripts in data processing 3. Illustrate some common tools * [src] Add batched xvector computation (kaldi-asr#3643) * [src] CUDA decoder: write all output to single stream (kaldi-asr#3666) .. instead of one per iteration. We are seeing that small corpus are dominated by writer Open/Close. A better solution is to modify the key with the iteration number and then handle the modified keys in scoring. Note that if you have just a single iteration the key is not modified and thus behavior is as expected. * [egs] Fix sox command in multi-Updated sox command (kaldi-asr#3667) * [src] Change data-type for hash in lattice alignment (avoid debugger complaints) (kaldi-asr#3672) * [egs] Two fixes to multi_en setup (kaldi-asr#3676) * [scripts] Fix misspelled variable in reverb script (kaldi-asr#3678) * Update cudamatrix.dox (kaldi-asr#3674) * [scripts] Add reduced-context option for TDNN-F layers. (kaldi-asr#3658) * [egs] Make librispeech run_tdnn_discriminative.sh more efficient with disk (kaldi-asr#3685) * [egs] Remove pitch from multi_cn nnet3 recipe (kaldi-asr#3686) * [egs] multi_cn: clarify metric is CER (kaldi-asr#3687) * [src,minor] Fix typo in comment in kaldi-holder.h (kaldi-asr#3691) * [egs] update run_tdnn_discriminative.sh in librispeech recipe (kaldi-asr#3689) * [egs] Aspire recipe: fixed typo in utterance and speaker prefix (kaldi-asr#3696) * [src] cuda batched decoder, fix memory bugs (kaldi-asr#3697) bug fix: Pass token by reference so we can return an address of it. Previously it was done by value which means we returned an address of a locally scoped variable. This lead to memory corruptions. re-add pinned memory (i.e. disable workaround) Free memory on control thread, switch shared_ptr to unique_ptr, don't reset unique ptr. * [scripts] Change make_rttm.py to read/write files with UTF-8 encoding (kaldi-asr#3705) * [src] Removing non-compiling paranoid asserts in nnet-computation-graph. (kaldi-asr#3709) * [build] Fix gfortran package name for centos (kaldi-asr#3708) * [scripts] Change the Python diagnostic scripts to accept non-ASCII UTF-8 phone set (kaldi-asr#3711) * [scripts] Fix some issues in kaldi-asr#3653 in split_scp.pl (kaldi-asr#3710) * [scripts] Fix 2 issues in nnet2->nnet3 model conversion script (kaldi-asr#886)(kaldi-asr#3713) Parsing learning rates would fail if it was written in e-notation (e.g. 1e-5). Also fixes ivector-dim=0 which could result if const component dim=0 in src nnet2 model. * Removing changes to split_scp.pl (kaldi-asr#3717) * [scripts] Improve how combine_ali_dirs.sh gets job-specific filenames (kaldi-asr#3720) * [src] Add --debug-level=N configure option to control assertions (kaldi-asr#3690) (kaldi-asr#3700) * [src] Adding some more feature extraction options (needed by some users..) (kaldi-asr#3724) * [src,script,egs] Goodness of Pronunciation (GOP) (kaldi-asr#3703) * [src] Making ivector extractor tolerate dim mismatch due to pitch (kaldi-asr#3727) * Revert "[src] Making ivector extractor tolerate dim mismatch due to pitch (kaldi-asr#3727)" (kaldi-asr#3728) This reverts commit 59255ae. * [src] Fix NVCC compilation errors on Windows (kaldi-asr#3741) * make nvcc+msvc happier when two-phase name lookup involved, NOTE: in simple native case (CPU, without cuda), msvc is happy with TemplatedBase<T>::base_value, but nvcc is not... * position of __restrict__ in msvc is restricted * [build] Add CMake Build System as alternative to current Makefile-based build (kaldi-asr#3580) * [scripts] Modify split_data_dir.sh and split_scp.pl to use utt2dur if present, to balance splits * [scripts] fix slurm.pl error (kaldi-asr#3745) * Revert "[scripts] Modify split_data_dir.sh and split_scp.pl to use utt2dur if present, to balance splits" (kaldi-asr#3746) This reverts commit 1d0b267. * [egs] Children's speech ASR recipe for cmu_kids and cslu_kids (kaldi-asr#3699) * [src] Incremental determinization [cleaned up/rewrite] (kaldi-asr#3737) * [scripts] Add scripts to create combine fmllr-tranform dirs(kaldi-asr#3752) * [src] CUDA decoder: fix invalid-lattice error that happens in corner cases (kaldi-asr#3756) * [egs] Add Chime 6 baseline system (kaldi-asr#3755) * [scripts] Fix issue in copy_lat_dir.sh affecting combine_lat_dirs.sh (missing phones.txt) (kaldi-asr#3757) * [src] Add missing #include, needed for CUDA decoder compilation on some platforms (kaldi-asr#3759) * [scripts] fix bug in steps/data/reverberate_data_dir.py (kaldi-asr#3762) * [src] CUDA allocator: fix order of next largest block (kaldi-asr#3739) * [egs] some fixes in Chime6 baseline system (kaldi-asr#3763) * changed scoring tool for diarization * added comment for scoring * fixing number of deletions, adding script to check DP result of the total errors is equivalent to the sum of the individual errors * updated RESULTS for new diarization scoring * outputing wer similar to compute-wer routine * adding routine to select best LM weight and insertion penalty factor based on the development set * updating results * changing lang_chain to lang, minor fix * adding all array option * change in directory structure of scoring_kaldi_multispeaker to make it similar to scoring_kaldi * removing test sets from run ivector script * added ref RTTM creation * making modifications for all array * minor fix * [src] CUDA decoding: add support for affine transforms to CUDA feature extractor (kaldi-asr#3764) * [src] relax assertion constraint slightly (RE matrix orthonormalization) (kaldi-asr#3767) * [src] CUDA decoder: fix bug in NumPendingTasks() (kaldi-asr#3769) * [src] Add options to select specific gpu, reuse cuda context (kaldi-asr#3750) * [src] Move CheckAndFix to config struct (kaldi-asr#3749) * [egs,scripts] Add recipes for CN-Celeb (kaldi-asr#3758) * [src] CUDA decoder: remove unecessary sync that was added for debugging (kaldi-asr#3770) * [src] CUDA decoder: shrink channel vectors instead of vector holding those vectors (kaldi-asr#3768) * add include path * update test

Made changes to the augmentation script to make it work for ASR and s…

5849164

…peaker ID

phanisankar-nidadavolu changed the title ~~Made changes to the augmentation script make it work for ASR and speaker ID~~ Made changes to the augmentation script to make it work for ASR and speaker ID Mar 15, 2019

Adding local/chain/multi_style/run_tdnn_1a.sh to the branch

a30997b

danpovey reviewed Mar 17, 2019

View reviewed changes

jty016 reviewed Apr 18, 2019

View reviewed changes

phanisankar-nidadavolu added 2 commits April 18, 2019 22:28

some minor changes

7b0ba4a

Made changes to multi_style example script to train i-vector extracto…

e92fb09

…r on augmented data, moved musan scripts to steps/data and other script level changes that Dan suggested in the PR review earlier

Delete run_tdnn_1a.sh

d5e9db8

Delete prepare_multistyle_data.sh

2c3ef53

david-ryan-snyder reviewed Apr 22, 2019

View reviewed changes

phanisankar-nidadavolu and others added 3 commits April 22, 2019 21:09

made changes that David Synder suggested

fc0514c

Delete run_tdnn_1a.sh

f02b8ab

adding the tuning script

b7551a0

vimalmanohar reviewed Apr 26, 2019

View reviewed changes

phanisankar-nidadavolu added 2 commits May 6, 2019 09:26

renamed the script to run_tdnn_aug.sh and made changes that vimal sug…

2237b6a

…gested

removing unwanted scripts

986aebe

phanisankar-nidadavolu force-pushed the augmentation-script-asr-spkrid branch from 3418b62 to 986aebe Compare May 6, 2019 13:41

phanisankar-nidadavolu added 2 commits May 6, 2019 09:45

removing prepare_multistyle_data.sh script

a923452

fixed some errors in documentation

f98de0e

danpovey merged commit a861e56 into kaldi-asr:master May 13, 2019

danpovey pushed a commit that referenced this pull request May 16, 2019

[scripts] typo fix in augmentation script (#3329)

0ff318b

Fixes typo in #3119

danpovey mentioned this pull request May 20, 2019

change default of --modify-spk-id from True to False. #3334

Merged

danpovey pushed a commit that referenced this pull request May 20, 2019

[scripts] Change --modify-spk-id default to False; back-compatibility…

8397e05

… fix for #3119 (#3334)

danpovey pushed a commit to danpovey/kaldi that referenced this pull request Jun 19, 2019

[scripts] typo fix in augmentation script (kaldi-asr#3329)

9ae4a5c

Fixes typo in kaldi-asr#3119

danpovey pushed a commit to danpovey/kaldi that referenced this pull request Jun 19, 2019

[scripts] Change --modify-spk-id default to False; back-compatibility…

264372c

… fix for kaldi-asr#3119 (kaldi-asr#3334)


		mkdir -p local/musan.tmp

		echo "Preparing ${data_dir}/musan..."


		if [ -e data/rt03 ]; then maybe_rt03=rt03; else maybe_rt03= ; fi

		if [ "$multi_style" == "true" ]; then

		parser.add_argument('--random-seed', type=int, dest = "random_seed", default = 123, help='Random seed.')

		parser.add_argument('--modify-spkr-id', type=str, default = "false", choices=["true", "false"], help='Utt prefix or suffix would be added to the speaker id also (Used in ASR), in speaker id it is left unmodifed' )

Made changes to the augmentation script to make it work for ASR and speaker ID #3119

Made changes to the augmentation script to make it work for ASR and speaker ID #3119

Conversation

phanisankar-nidadavolu commented Mar 15, 2019

danpovey commented Mar 15, 2019

danpovey left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danpovey commented Apr 18, 2019

phanisankar-nidadavolu commented Apr 18, 2019 via email

danpovey commented Apr 18, 2019 via email

phanisankar-nidadavolu commented Apr 18, 2019 via email

david-ryan-snyder commented Apr 18, 2019

danpovey commented Apr 18, 2019 via email

david-ryan-snyder commented Apr 18, 2019

danpovey commented Apr 18, 2019 via email

david-ryan-snyder commented Apr 18, 2019

danpovey commented Apr 18, 2019 via email

danpovey commented Apr 18, 2019 via email

phanisankar-nidadavolu commented Apr 18, 2019 via email

david-ryan-snyder commented Apr 22, 2019

phanisankar-nidadavolu commented Apr 22, 2019 via email

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

david-ryan-snyder Apr 22, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danpovey commented Apr 23, 2019 via email

danpovey commented Apr 23, 2019 via email

david-ryan-snyder commented Apr 25, 2019

david-ryan-snyder commented Apr 25, 2019 • edited Loading

vimalmanohar commented Apr 26, 2019

david-ryan-snyder commented Apr 26, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danpovey commented Apr 26, 2019 via email

vimalmanohar commented Apr 26, 2019

danpovey commented Apr 26, 2019 via email

phanisankar-nidadavolu commented May 1, 2019 via email

phanisankar-nidadavolu commented May 1, 2019 via email

phanisankar-nidadavolu commented May 6, 2019

david-ryan-snyder Apr 22, 2019 •

edited

Loading

david-ryan-snyder commented Apr 25, 2019 •

edited

Loading