tutorial.py

# -*- coding: utf-8 -*-
"""tutorial.ipynb

Automatically generated by Colaboratory.

Original file is located at
    https://colab.research.google.com/drive/1aK3u6gRONfENmbRf3Rzz-UAlmcTc8jR4

# This shows how to build a tensorflow(2.x) model with various method. Guides on the internet are not explained well, especially the pros and cons between them.
# 這示範各種不同建tf(2.x) model的方法，網路上介紹的都很破也沒介紹方法間的優劣，尤其是中文資源（OO幫倒忙）。

The methods are listed below:
有以下幾種方法

1.   Sequential 連續
2.   Functional 功能
3.   Class 類別

First we talked about sequential. We usually not use sequential only if your model is a straight line. Below is a sequential model with multiple layers.

首先介紹sequential，簡單來說如果你的model一條線就完成了那可以使用sequential，所以通常不會用它，下面是三個dense layer的sqeuential model。
"""

from tensorflow import keras
from tensorflow.keras import layers
model = keras.Sequential(
    [
        layers.Dense(2, activation="relu", name="layer1"),
        layers.Dense(3, activation="relu", name="layer2"),
        layers.Dense(4, name="layer3"),
    ]
)

"""Next we move to Functional method. This is the most used method in my coding. It has the merit of elasticity build that you can build any types of network with this method, and you don't need to type much like class method. Example below shows the cross layer connection that fails in sequential method. The re-defined x2 variable shows that the tensorflow will calculate the connection for re-defined x2, so you can be ease for it (it is counted in).

Functional是最彈性的搭建方法，所有類型的網路都可以用它建而且不用像class打一堆字，下面的add layer就是sequential法無法做到的。重新定義的x2展示了tensorflow會計算被重新命名的x2變數，所以不用擔心這樣沒接到。
"""

inputs = layers.Input((10, 20), batch_size=None)
x1 = layers.Dense(64, activation="relu")(inputs)
x2 = layers.Dense(64, activation="relu")(x1)
x2 = layers.Add()([x1, x2])    # cross layer connection
x2 = layers.Dense(64)(x2)      # re-defined variable x2
outputs = layers.Dense(10)(x2)
model = keras.Model(inputs=inputs, outputs=outputs, name="mnist_model")

"""You can define batch_size later, the argument here just a early define. If you don't define it here the model.summary() would shows first dimension (batch dimension) None until you fit it. The second bracket behind layers.Dense is where the dense layer connect from.
Next we talk about some trick in functional method. You can get the intermediate output by following, and check the model.outputs checks the output dimension:

layers.Input的batch_size只是提早指定batch size，如果你現在用model.summary()的話第一個維度(batch size的維度)會顯示None。layers.Dense的第二個括弧決定這層是從哪裡接來的。
接下來我們講一些有用的trick，如果要取得中間層的輸出可以這樣寫，然後用model.outputs檢查輸出維度：
"""

model2 = keras.Model(inputs=inputs, outputs=x1)    # get intermediate output
model2(YOUR_INPUT_DATA)
print(model2.inputs)    # check the model inputs and outputs dimension
print(model2.outputs)

"""If your network is build by many same block, you can do this (you can also do this with sequential method):

如果你的網路是好幾個同樣的小網路構成可以這樣寫(sequential也可以這樣)：
"""

# common mistake, you re-connect x1 to inputs so x1 is only defined once
def small_model(depth):
  inputs = layers.Input((10, 20), batch_size=None)
  for i in range(depth):
    x1 = layers.Dense(64, activation="relu")(inputs) # wrong, only connect once
    x1 = layers.Dense(32, activation="relu")(x1)
  outputs = layers.Dense(10)(x1)
  return keras.Model(inputs=inputs, outputs=outputs)

# correct version, check these two version by small_model(3).summary()
def small_model(depth):
  inputs = layers.Input((10, 20), batch_size=None)
  x1 = layers.Dense(64, activation="relu")(inputs) 
  for i in range(depth):    # build multiple layer with for loop
    x1 = layers.Dense(64, activation="relu")(x1)     
    x1 = layers.Dense(32, activation="relu")(x1)
  outputs = layers.Dense(10)(x1)
  return keras.Model(inputs=inputs, outputs=outputs)    # return a "model"

x = layers.Input((10, 20), batch_size=None)
x1 = small_model(depth)(x)
x2 = small_model(depth)(x)
outputs = layers.Add()([x1, x2])
model3 = keras.Model(inputs=inputs, outputs=outputs, name="mnist_model")
model3.summary()

"""This shows the network you can define the depth in one shot. It also shows how to outputs multiple layer within single inputs.
Here comes the new problem. What if the def not return model but a tensor (layer)? The answer is you can more easily view all the layers in your network, on the contrary, return model would directly outputs small_block with property model. You can check this by model3.summary() or tf.keras.utils.plot_model(model3). Below is the example by return layer.


這樣就是一個你可以決定每個block要幾層的network，有種樂高的感覺，然後這個範例也解釋了一個輸入(x1)拆到多個輸出(x1, x2)要怎麼打。
這時候有個新問題，如果def不return model而是return tensor (layer)會發生什麼事？答案是你可以更方便的檢視你所有層的連接，相反地model會直接輸出一個block，你可以用model3.summary() or tf.keras.utils.plot_model(model3)確認他的長相，下面是return layer的範例。
"""

def multiple_tensor_graph(input_tensor, depth):
  x1 = layers.Dense(64, activation="relu")(input_tensor)
  for i in range(depth):
    x1 = layers.Dense(64, activation="relu")(x1)
    x1 = layers.Dense(32, activation="relu")(x1)
  outputs = layers.Dense(10)(x1)
  return layers.Dense(10)(x1)    # return a "tensor"

x = layers.Input((10, 20), batch_size=None)
x1 = multiple_tensor(x, depth)    # note that since it return a tensor, it need 
x2 = multiple_tensor(x, depth)    # a tensor input instead of layers.Input
outputs = layers.Add()([x1, x2])
model4 = keras.Model(inputs=inputs, outputs=outputs, name="mnist_model")
model4.summary()

"""The last is class method. The only merit I can found is variable safe.

最後是class法，除了變數安全我真的不知道他好在哪
"""

class myModel(tf.keras.Model):                # class method means inherit from keras.Model
    def __init__(self, variable_you_need):    # functional method don't need this line!
        super(myModel, self).__init__()       # functional method don't need this line!
        self.brabrabra = variable_you_need
        self.Avg = layers.AveragePooling1D(1) # you need to type twice when making single layer!   
        self.yBN = layers.BatchNormalization() 
        self.pBN = layers.BatchNormalization()
        self.cat = layers.Concatenate()        
        self.FC = layers.Dense(1)
        self.AC = layers.Activation('sigmoid')
        self.FL = layers.Flatten(name='A')
    def call(self, inputs):
        y_final, p_final, sa_final = inputs   # when the layer has dimension error, it's hard to debug it!
        y_final = self.Avg(y_final)           # functional method don't need to type self. !
        y_final = self.yBN(y_final)
        p_final = self.pBN(p_final)
        outputs = self.cat([y_final, p_final, sa_final])
        outputs = self.yBN(outputs)
        outputs = self.FC(outputs)
        outputs = self.AC(outputs)
        outputs = self.FL(outputs)
        return outputs

"""Summary
1.   Sequential for easy model.
2.   Functional is the best method that you can decide to build by model or multiple tensor.
3.   Class is bad.

Finally we discuss about the difference between build by model and tensor. If you want to check your model with weight or need intermediate output, use tensor to build, otherwise model to build is enough, and you can ch
eck the number of trainable parameter by model.summary() and keras_flops package that tensor can't do it. The last is a training example.

總結
1.   Sequantial太簡單用不到。
2.   Functional很方便你可以決定要輸出model還是tensor。
3.   Class很爛。

最後我們討論用model和跟tensor建的不同，如果常檢查weight或需要中間層的輸出，那用tensor會很方便，不然用model建就足夠了，而且用model你可以檢查可train參數量還能用keras_flops package，tensor沒辦法用這些東西。最後的最後給一個training的範例。
"""

model.compile(optimizer=adam,
             loss=['categorical_crossentropy', tf.losses.BinaryCrossentropy],
             loss_weights=[.8, .2],
             metrics='acc')

model.fit(inputs, label,
        epochs=60,
        batch_size=1024,
        validation_split=0.1,
        callbacks=[tf.keras.callbacks.EarlyStopping(monitor='loss', patience=3),
                   tf.keras.callbacks.ReduceLROnPlateau(
                       monitor='val_loss', factor=0.2, patience=5, min_lr=0.001)])
# early stop when converge and reduce learning rate for loss floor

"""# Now you have finished the course. Below is not often used trick you can search it long after.

# 看到這邊就算結束了，下面是比較不常用到的東西，之後用到再回來看。

Custom loss:
"""

class bce(tf.keras.losses.Loss):
    def __init__(self, tau=0.5, name='myloss', **kwargs):
      super(custom_metric, self).__init__(name=name, **kwargs)
      self.L1 = tf.losses.BinaryCrossentropy()    # loss 1

    def call(self, y_true, y_pred, from_logits=False, label_smoothing=0.0, axis=-1):
        # call defines what it computes
        return self.L1(y_true, y_pred) + tf.math.reduce_mean(tf.pow(tf.abs(y_true-y_pred), 2))

"""Custom layer:"""

class Beta(layers.Layer):
    # a trainable constant multiplication
    def __init__(self, val):
        super(Beta, self).__init__()
        self.val = val

    def build(self, input_shape):
        # this method predifne the weights
        self.beta = self.add_weight('beta', shape=[1,1], trainable=True, initializer=tf.constant_initializer(self.val))

    def call(self, inputs):
      # if this layer has multi-input:
      # inputs1, inputs2 = inputs
      # anything you want to do
        return tf.math.multiply(self.beta, inputs)

# call it same as default layers
outputs = Beta(0.5)(inputs)
# call it if multi inputs
#outputs = Beta(0.5)([inputs1, inputs2])

"""Custom callback:"""

class custom_callback(tf.keras.callbacks.Callback):
  # you can even visualize the convergence in real data when training
    def __init__(self, YOUR_VAL_DATA):
        super(custom_callback, self).__init__()
        self.task_type = ''
        self.epoch = 0
        self.batch = 0
        self.YOUR_VAL_DATA = YOUR_VAL_DATA
         
    def on_epoch_end(self, epoch, logs=None):
        label = self.model.predict(self.YOUR_VAL_DATA)
        plt.plot(label)

"""Pack multiple model as single layer:"""

def pack(tensor_list):
    [a_in, b_in] = tensor_list # a list that contains multiple inputs from last layer
    a_out = model(a_in)
    b_out = model1(b_in)
    return [a_out, b_out]

"""Multiple input output model:"""

inputs1 = layers.Input((10, 20), batch_size=None)
inputs2 = layers.Input((10, 20), batch_size=None)
x1 = layers.Dense(64, activation="relu")(inputs)
x2 = layers.Dense(64, activation="relu")(x1)
x2 = layers.Add()([x1, x2])
outputs1 = layers.Dense(10)(x2)
outputs2 = layers.Dense(1)(x1)
model = keras.Model(inputs=[inputs1, inputs2], outputs=[outputs1, outputs2])

# compile with multi loss
model.compile(optimizer=adam,    
             loss=['categorical_crossentropy', tf.losses.BinaryCrossentropy],
             loss_weights=[.8, .2],    # define their weight
             metrics='acc')

# train with multi input output
model.fit([input1, input2],[output1, output2],
        epochs=60,
        batch_size=1024,
        validation_split=0.1,
        callbacks=[reduce_lr, early_stop, your_callbacks()],
        verbose=1)

"""transfer learning:

*https://keras.io/guides/transfer_learning/#freezing-layers-understanding-the-trainable-attribute*
"""

import some_trained_model_or_layer
layer = some_trained_model_or_layer
layer.trainable = False  # Freeze the layer

# It's important to recompile your model after you make any changes
# to the `trainable` attribute of any inner layer, so that your changes
# are take into account
model.compile(optimizer=keras.optimizers.Adam(1e-5),  # Very low learning rate
              loss=keras.losses.BinaryCrossentropy(from_logits=True),
              metrics=[keras.metrics.BinaryAccuracy()])

"""Add loss: use it while you want to add loss to a certain layer, even it doesn't has relationship to output label. You can see it on tensorflow offical tutorial for add loss in layer, below is add loss by model method:

當你想要loss加在某一層，甚至他不要跟training data有關係的時候可以用它。你可以去tensorflow官網找在layer加loss的，下面是用model的方法加：

"""

def loss_min_l2norm(outputs):
    return tf.reduce_mean(tf.sqrt(tf.reduce_sum(tf.square(tf.abs(outputs[:, 0, :])), axis=-1)))*0.01
loss = layers.Lambda(lambda x: loss_min_l2norm(x))(layer[0])
model.add_loss(loss)

# or you can do this
def loss_min_l2norm_with_multi_layer(inputs):
  input1, input2 = inputs
  # input 1, 2  can be different layer output but the have correlation, in this example they should as same as possible
    return tf.reduce_mean(tf.reduce_sum(tf.pow(tf.abs(input1, input2), 2)))*0.01
loss = layers.Lambda(lambda x: loss_min_l2norm_with_multi_layer(x))([layer1, layer2])
model.add_loss(loss)

"""Re-input to same layer (like iteration in traditional algorithm):"""

poor_guy = layers.Dense(100)
inputs = layers.Input((1,10))(x)
a = poor_guy(x)
a = poor_guy(a)
outputs = poor_guy(a)
model5() = keras.Model(inputs=inputs, outputs=outputs)
tf.keras.utils.plot_model(model5)

"""Generator: when the training data is too big GPU can't accomadate."""

def your_generator():
    while True:
        data, label = YOUR_DATA_INPUT()
        yield data, label

model.compile(optimizer=adam,
             loss='categorical_crossentropy',
             metrics='acc')

VAL_DATA=your_generator()
model.fit(your_generator(),
        epochs=100000,
        batch_size=2048,
        validation_data=VAL_DATA,
        validation_steps=1,
        steps_per_epoch=5000,    # this defines how many steps in single epoch 
        callbacks=[reduce_lr])   # since it doesn't know the total data size

"""Faster training: jit compile (XLA). *See https://www.tensorflow.org/xla/tutorials/autoclustering_xla*"""

# add it before model.fit. You can set false before build model
# example:
tf.keras.backend.clear_session()
tf.config.optimizer.set_jit(False)
# ...build model
tf.keras.backend.clear_session()
tf.config.optimizer.set_jit(True)
# ...model.fit

"""Faster training: mixed precision (might cause zero gradient). *See https://www.tensorflow.org/guide/mixed_precision*"""

# you need to put this before construct any tensor, better first line
from tensorflow.keras import mixed_precision
mixed_precision.set_global_policy('mixed_float16')

"""# Questions

**How the layer.Dense(name = '...') work in tensorflow**

It has no relation to python variable, it just a name process and you can name it to make it easier to find your layer (otherwise it's dense1, dense2, ...)

他跟python變數沒關係，就只是個命名然後你可以更好找到你的layer，不然layer名稱會是依序排下去你很難辨認。


---


**Batch size pre-define in layers.Input or define in model.fit? **
Predefine you can't not use get_flops since it need batch size = 1.

batch size要不要在layers.Input就設好還是在model.fit再設定？唯一的差別只有你在用get_flops的時候會出錯因為他要batch size = 1


---


**keras backend or tf.math? (~TF2.9, TF2.10 after not tested)**

tf.math is enough. You can even mix using them. However, if you construct a custom layer with tf.math(...) method as a trainable, use model.trainable=False would not fix weight but it shows False.

tf.math就夠了，你還可以混用他們。但是如果你用tf.math寫一個trainable custom layer在設定trainable=False的時候雖然顯示False但還是不會fix。


---
**Is keras_flops (get_flops) precise?**

It provide a soso precision. It uses tensorflow build-in package to calculate, and it has some difference between theory. For example, softmax provide 5 times input size FLOPs, however in theory it's 3 times input size FLOPs.

還行，不過他用tensorflow內建的包來算所以和理論值有差，例如softmax是5倍input size但理論值是3倍input size.

*ref: https://github.com./tensorflow/tensorflow/blob/master/tensorflow/python/profiler/internal/flops_registry.py*


---


**What's locally connected 1D/2D?**

I don't know either, please tell me.

# The summary of summary
We discussed:

1.   Sequential v.s. functional v.s. class
2.   Difference of where to define batch size
3.   What happens to rename a connected layer
4.   Multi-same network construct
5.   Model construct or tensor (layer) construct
6.   Multi-input/output model/layer
7.   Custom loss/layer/callback
8.   How to pack multiple model
9.   Methods to speed up training

我們討論了：

1. 三種建構方法
2. 哪裡定義batch size的差別
3. 重定義一個連接過的層會怎樣
4. 多個相同network建構
5. 要用model還是tensor (layer)建構
6. 多輸入輸出的模型或層
7. 自定義loss/layer/callback
8. 如何打包多模型
9. 加速訓練的方法
"""