Python script to convert nnet2 to nnet3 models #1611

jfainberg · 2017-05-07T11:07:17Z

Initial implementation - I'll test it a bit more over the next few days, but open for comments.

Couple of points:

Currently considers everything but the final model temporary (including the generated config it uses to initialise the model).
Uses numpy to combine matrices and vectors into a single matrix for initialisation. Possible to do using just file manipulation, just a bit more messy.
Ignores ValueSum and DerivSum stat fields of e.g. normalize and softmax components.

vijayaditya · 2017-05-07T13:09:00Z

"Uses numpy to combine matrices and vectors into a single matrix for initialisation. Possible to do using just file manipulation, just a bit more messy."

I think this dependence on numpy for cleaner code is fine, as this binary is going to be used by people who might have sufficient experience with Kaldi.

vijayaditya · 2017-05-08T23:09:09Z

egs/wsj/s5/steps/nnet3/convert_nnet2_to_nnet3.py

+KNOWN_COMPONENTS = NODE_NAMES.keys()
+# End configuration section
+
+logger = logging.getLogger(__name__)


I am not familiar with the latest logging guidelines. @vimalmanohar could you take a look.

The logging is fine. Although, the guidelines for variable names and spacing in the Google style guide are not followed. https://google.github.io/styleguide/pyguide.html

vijayaditya

Here are some comments after a quick initial review.

vijayaditya · 2017-05-08T23:13:55Z

egs/wsj/s5/steps/nnet3/convert_nnet2_to_nnet3.py

+        if (result != 0):
+            raise OSError('Encountered an error writing the model.')
+
+def ParseNnet2(line_buffer):


Better to rename this as ParseNnet2ToNnet3

vijayaditya · 2017-05-08T23:22:16Z

egs/wsj/s5/steps/nnet3/convert_nnet2_to_nnet3.py

+
+def ConsumeToken(token, line):
+    '''Returns line without token'''
+    if token != line.split()[0]:


IIRC these lines can be very large. So it might be better to just check for this using regexes or at least using split() method using something like a maxsplit=1 option.

vijayaditya · 2017-05-08T23:27:07Z

egs/wsj/s5/steps/nnet3/convert_nnet2_to_nnet3.py

+
+    def WriteModel(self, model, binary="true"):
+        result = 0
+


Check that self.config is a proper nnet3 config file.

vijayaditya · 2017-05-08T23:30:42Z

egs/wsj/s5/steps/nnet3/convert_nnet2_to_nnet3.py

+                                                                                           self.priors, 
+                                                                                           model), shell=True)
+
+        if (result != 0):


Do you think it might be worth adding a check in a top level shell script which performs forward prop through the nnet2 and nnet3 models and asserts that the values are within acceptable threshold ?

I don't know if there is a easy way to check the correctness of the transition model without doing a decode.

Good idea. How about another argument to the argparser for some features, and if present, will run a forward pass and compare the results using difflib; so keeping it all in the single Python script?

The transition model isn't read by Numpy, so shouldn't have any numerical errors I think.

vijayaditya · 2017-05-08T23:35:20Z

egs/wsj/s5/steps/nnet3/convert_nnet2_to_nnet3.py

+        result = 0
+
+        # write raw model
+        result += subprocess.call('nnet3-init --binary=true {0} {1}'.format(self.config, os.path.join(tmpdir, 'nnet3.raw')), shell=True)


Do you want to consider using the KaldiCommand function available in the nnet3 python libraries ? This might help you handle some common errors.

vijayaditya · 2017-05-08T23:37:34Z

egs/wsj/s5/steps/nnet3/convert_nnet2_to_nnet3.py

+            for component in self.components:
+                if component.ident == "splice":
+                    # Create splice string for the next node
+                    previous_component=MakeSpliceString(previous_component, component.pairs['<Context>'])


It is not clear how you are handling the SpliceMaxComponent here ?

Sorry, I had forgot about SpliceMaxComponent - it's not explicitly handled at the minute. What is the equivalent in nnet3? I couldn't find a max-descriptor (e.g. 'Max(Offset, ...)')

vijayaditya · 2017-05-08T23:38:00Z

egs/wsj/s5/steps/nnet3/convert_nnet2_to_nnet3.py

@@ -0,0 +1,441 @@
+#!/usr/bin/env python
+# Copyright 2016
+


Please add your name to the author list.

Function/method names, spaces, column width.

vijayaditya · 2017-05-09T11:31:40Z

This is not something that can be done using a descriptor. You would need to add a component. I think Max pooling component can be used but I don't remember if the data needs to be rearranged. However I know of no architectures which were using splicemaxcomponent in nnet2, so I think you can just raise an error rather than handling this case.

…

On May 9, 2017 3:02 AM, "Joachim" ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In egs/wsj/s5/steps/nnet3/convert_nnet2_to_nnet3.py <#1611 (comment)>: > + self.config = filename + with open(filename, 'w') as f: + for component in self.components: + if component.ident == "splice": + continue + config_string = ' '.join(component.pairs) + + f.write('component name={name} type={comp_type} {config_string}\n'.format(name=component.ident, comp_type=component.component, config_string=config_string)) + + f.write('\n# Component nodes\n') + f.write('input-node name=input dim={0}\n'.format(self.input_dim)) + previous_component='input' + for component in self.components: + if component.ident == "splice": + # Create splice string for the next node + previous_component=MakeSpliceString(previous_component, component.pairs['<Context>']) Sorry, I had forgot about SpliceMaxComponent - it's not explicitly handled at the minute. What is the equivalent in nnet3? I couldn't find a max-descriptor (e.g. 'Max(Offset, ...)') — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1611 (comment)>, or mute the thread <https://github.com./notifications/unsubscribe-auth/ADtwoKMhBNXJMC7hAigNiFpdMICSK5riks5r4DmbgaJpZM4NTGyg> .

danpovey · 2017-06-07T01:22:55Z

So @jfainberg, this is ready to commit? Still says WIP.

danpovey · 2017-06-07T22:57:54Z

@vijayaditya if you say it's still good I'll merge.

jfainberg · 2017-06-07T22:58:26Z

Yes @danpovey, please go ahead. I've tested it (forward pass) with typical nnet2 pnorm and tanh networks.

vijayaditya · 2017-06-07T23:45:40Z

@danpovey Went through it briefly. LGTM.

@akshayc11

* 'master' of https://github.com./kaldi-asr/kaldi: (140 commits) [egs] Fix failure in multilingual BABEL recipe (regenerate cmvn.scp) (kaldi-asr#1686) [src,scripts,egs] Backstitch code+scripts, and one experiment, will add more later. (kaldi-asr#1605) [egs] CNN+TDNN+LSTM experiments on AMI (kaldi-asr#1685) [egs,scripts,src] Tune image recognition examples; minor small changes. (kaldi-asr#1682) [src] Fix bug in looped computation (kaldi-asr#1673) [build] when installing sequitur and mmseg, look for lib64 as well (thanks: @akshayc11) (kaldi-asr#1677) [src] fix to gst-plugin/Makefile (remove -lkaldi-thread) (kaldi-asr#1680) [src] Cosmetic fixes to usage messages [egs] Fix to some --proportional-shrink related example scripts (kaldi-asr#1674) [build] Fix small bug in configure [scripts] Fix small bug in utils/gen_topo.pl. [scripts] Add python script to convert nnet2 to nnet3 models (kaldi-asr#1611) [doc] Fix typo (kaldi-asr#1669) [src] nnet3: fix small bug in checking code. Thanks: @Maddin2000. [src] Add #include missing from previous commit [src] Fix bug in online2-nnet3 decoding RE dropout+batch-norm (thanks: Wonkyum Lee) [scripts] make errors getting report non-fatal (thx: Miguel Jette); add comment RE dropout proportion [src,scripts] Use ConstFst or decoding (half the memory; slightly faster). (kaldi-asr#1661) [src] keyword search tools: fix Minimize() call, necessary due to OpenFst upgrade (kaldi-asr#1663) [scripts] do not fail if the ivector extractor belongs to different user (kaldi-asr#1662) ...

…sr#1611)

Python script to convert nnet2 to nnet3 models.

8ba8399

vijayaditya reviewed May 8, 2017

View reviewed changes

vijayaditya requested changes May 8, 2017

View reviewed changes

Comply with Google style guidelines

4b6890c

Function/method names, spaces, column width.

Switch to KaldiCommand + review comments

4612502

vijayaditya approved these changes Jun 7, 2017

View reviewed changes

Restrict choices for binary argument (fix for final review)

91b6daa

jfainberg changed the title ~~WIP: Python script to convert nnet2 to nnet3 models~~ Python script to convert nnet2 to nnet3 models Jun 7, 2017

danpovey merged commit a0795ec into kaldi-asr:master Jun 7, 2017

Skaiste pushed a commit to Skaiste/idlak that referenced this pull request Sep 26, 2018

[scripts] Add python script to convert nnet2 to nnet3 models (kaldi-a…

8413f14

…sr#1611)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python script to convert nnet2 to nnet3 models #1611

Python script to convert nnet2 to nnet3 models #1611

jfainberg commented May 7, 2017

vijayaditya commented May 7, 2017

vijayaditya May 8, 2017

vimalmanohar May 8, 2017

vijayaditya left a comment

vijayaditya May 8, 2017

vijayaditya May 8, 2017

vijayaditya May 8, 2017

vijayaditya May 8, 2017

vijayaditya May 8, 2017

jfainberg May 9, 2017

vijayaditya May 8, 2017

vijayaditya May 8, 2017

jfainberg May 9, 2017

vijayaditya May 8, 2017

vijayaditya commented May 9, 2017 via email

danpovey commented Jun 7, 2017

danpovey commented Jun 7, 2017

jfainberg commented Jun 7, 2017

vijayaditya commented Jun 7, 2017

Python script to convert nnet2 to nnet3 models #1611

Python script to convert nnet2 to nnet3 models #1611

Conversation

jfainberg commented May 7, 2017

vijayaditya commented May 7, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vijayaditya left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vijayaditya commented May 9, 2017 via email

danpovey commented Jun 7, 2017

danpovey commented Jun 7, 2017

jfainberg commented Jun 7, 2017

vijayaditya commented Jun 7, 2017