How to connect computer vision and natural language processing

16 min readMay 8, 2023

The input image: consists of a sequence in randomly selected hand written digits. However, to make the experiments more interesting, are some probability, the sequence could have blanks.
The output language: consists of char based sequences that describe the content of the image.

Introduction

We all know during the current period the extent of the spread of artificial intelligence and the extent of its importance in the period in which we live now. If we say that artificial intelligence has become in all fields, we are not exaggerating because it has already done so. In the economy and so on, as the fields have become very dependent on it, and it in itself has become easier for those fields a lot and a lot in terms of reducing time and also the ability to achieve results and speed of response and so on.
Today we have one of the most important areas on which artificial intelligence depends, which is computer vision and also natural language programming. So, in this example, we will present an idea of its kind, which is how to link the two fields together in one project and make them cooperate in extracting a result that shows the extent of its impact.

What is the Computer visoin?

Computer visopn is one of important field in Machine Learning and it work on the image dataset and learning from it, and make the computer extract the information from the image and vedio and take the action based on this information.

What is the Nutrul Language Processing(NLP)?

NLP is another field and it join for the familay of AI, and it work on the text dataset and extract the information from this data and make computer learn from it and take the action.

Here, in this action we will work on the new way that talk about how to connect this tow fields together?…. here we go.

The data we have here it Mnist dataset it is images for hand written numbers.

The idea here how to bulid model that model work to make extract the impormation from image and extract or describe the image what this image have from things.

We will make connect between CNN and RNN to do that idea.

we can go for implmentation.

Implmentation

First read the important library.

# That library using it to create the model cnn and rnn 
import tensorflow.keras
# This using data set to load the data we will use to work.
from tensorflow.keras.datasets import mnist
# Using this to create the model sequential
from tensorflow.keras.models import Sequential
# Using from keras layer to import the each layer for oure model.
from tensorflow.keras.layers import Dense, Dropout, Flatten, LSTM
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Input
# what is the "Backend" : Keras is a model-level library, providing high-level building blocks for developing deep learning models. It does not handle itself low-level operations such as tensor products, convolutions and so on.
from tensorflow.keras import backend as K
# Using this model tom create the model function API.
from tensorflow.keras.models import Model 
# We using "plot_model" to make plot for the model structure
from tensorflow.keras.utils import plot_model
# this use it to work with array and matrix and math function and operation.
import numpy as np
# Here we use matplotlib to ploting the figure.
import matplotlib.pyplot as plt
# this library work with image and it short cut for Python Image Library.
from PIL import Image

Second load the dataset that we will use it here in our project:

(x_train,y_train),(x_test,y_test)= mnist.load_data()

the output here:

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11490434/11490434 [==============================] - 0s 0us/step

check the shape for the data we have.

print("The shape of X_Train", x_train.shape)
print("The shape of the y_train",y_train.shape)
print("Shape of X_test", x_test.shape)
print("Shape of y_test", y_test.shape)

output:

The shape of X_Train (60000, 28, 28)
The shape of the y_train (60000,)
Shape of X_test (10000, 28, 28)
Shape of y_test (10000,)

we check the backend for the image.

K.image_data_format()

output:

channels_last

what is the channle list?

Channels last memory format is an alternative way of ordering NCHW tensors in memory preserving dimensions ordering. Channels last tensors ordered in such a way that channels become the densest dimension that know also as storing images pixel-per-pixel.

Here we can work on assuming channels_last.

# set the hight and width for our images.
img_rows, img_cols = 28, 28
# Reshape for train data set 
x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
# Reshape for test data (1>>> here it mean the image is gray scale)
x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)

Here we can the change the datatype for the image to float

# Here we change the train image
x_train = x_train.astype('float32')
# Here change also test image.
x_test = x_test.astype('float32')
# Here normalize the train data
x_train /= 255
# Also normalize the test data.
x_test /= 255

print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

The output after reshape the image.

x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples

Here as we see the channal we add for the image that we see in the x_train shape and we have 60000 train sample, and 10000 test.

Here we set the name for each number we have from one to ten.

digit_text = ['zero.', 'one.', 'two.', 'three.', 'four.', 'five.', 'six.', 'seven.', 'eight.', 'nine.']

Now we work on the generate the image sequences dataset.

# Identify that to determines the probability of having a short sequence, where a blank space is randomly inserted in the image sequence. 
short_seq_chance = 0.7 
# here we set the spacific lenght for each sequence
seq_len = 3
# here set the number of sample that generate and here the y_train is the numpy array that content the label for the dataset..
num_samples = y_train.shape[0]
# Create the empty list to store the text label of sequence and initializes an array 
y_nums_text = [] 
# "x_m_train" to store the image sequences, that new image shape 
x_m_train = np.zeros((num_samples, img_rows, seq_len*img_cols), dtype='float32')

for i in range(num_samples):
  rand_digits = np.random.randint(len(y_train), size=seq_len) # three random digits
  x_m_train[i] = np.hstack((x_train[rand_digits[0]].squeeze(), x_train[rand_digits[1]].squeeze(), x_train[rand_digits[2]].squeeze()))
  digits_srts = []
  for s in range(seq_len):
    digits_srts.append(digit_text[y_train[rand_digits[s]]])
  
  # put blanks randomly 
  if (np.random.uniform() < short_seq_chance):
    mask = np.random.randint(2, size=seq_len) # mask of 0s and 1s
    for pos, m in enumerate(mask):
      if m == 0: # remove this from sequnce
        x_m_train[i, :, pos*img_cols:(pos*img_cols)+img_cols] = 0
        digits_srts[pos] = ''

  # construct the final label
  label = ''
  for l in digits_srts:
    label = label + l          
  y_nums_text.append(label)

For the loop here we can say what happen there:

for each sequence, seq_len we select random digits from the training set y_train and their corresponding images are retrieved from the input image dataset x_train. The images are then concatenated horizontally and stored in x_m_train. The corresponding text labels of the digits are also retrieved and stored in digits_srts.

Now we can say, if a short sequence is selected, a random binary mask of length seq_len is created. For each position in the mask, if the value is 0, the corresponding image in x_m_train is set to zero and the corresponding text label in digits_srts is set to an empty string. This creates a blank space in the sequence. here we can say, the text labels of all the digits in digits_srts are concatenated and stored in label. label is appended to the list y_nums_text to store the text label for the current sequence.

After complete the loop we can see, y_nums_text and x_m_train are returned, which contain the text labels and image sequences, respectively, of the generated dataset.

Now we can show some of image that we have in the work:

# that the number of the didit we will display here.
n = 10  
plt.figure(figsize=(3, 20))
for i in range(n):
    rand_digits = np.random.randint(len(y_train), size=seq_len)
    image = x_m_train[i]
 
    ax = plt.subplot(n, 1, i + 1)
    plt.imshow(image.reshape(28, seq_len*28))
    plt.gray()
    plt.title(y_nums_text[i])
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
    
plt.show()

Output from this code:

As we see in the output in the image each image have title that and set text show what is the image have or describe the image.

target_texts = []
target_characters = set()

for nt in y_nums_text:
  t_nt_n = '\t' + nt + '\n'
  target_texts.append(t_nt_n)
  for char in t_nt_n:
      if char not in target_characters:
          target_characters.add(char)
print(list(target_texts[:20]))

Output for the code:

['\teight.six.\n', '\teight.nine.zero.\n', '\t\n', '\tseven.five.one.\n', '\tfour.six.four.\n', '\tseven.\n', '\teight.one.\n', '\tfive.one.\n', '\t\n', '\tnine.six.one.\n', '\tseven.\n', '\tsix.\n', '\tzero.six.zero.\n', '\tone.zero.\n', '\tone.\n', '\t\n', '\tnine.zero.one.\n', '\ttwo.one.\n', '\tone.three.two.\n', '\ttwo.four.\n']

to explain the code we can say the code is preparing the text labels generated in the previous explained for use in a sequence-to-sequence (seq2seq) model. Specifically, it is creating a list of text sequences, where each sequence is the original text label with a start-of-sequence (`\t`) token prepended and an end-of-sequence (`\n`) token appended. we can say also the code also creates a set of all unique characters present in the target text sequences by iterating through each text sequence and each character in the sequence. If a character is not already present in the `target_characters` set, it is added to the set. This set of characters will be used to create a character-level vocabulary for the seq2seq model, with each unique character represented by a unique index. The `target_characters` set will be used later to determine the size of the output vocabulary for the model.

# we can here stored and set as list 
target_characters = sorted(list(target_characters))
# Here set the number of decoder tokens by using length target for each characters
num_decoder_tokens = len(target_characters)
# Here set the max decoder from sequence length 
max_decoder_seq_length = max([len(txt) for txt in target_texts])

print('Number of text samples:', len(target_texts))
print('Number of unique output tokens:', num_decoder_tokens)
print('Max sequence length for outputs:', max_decoder_seq_length)

Output for the code:

Number of text samples: 60000
Number of unique output tokens: 18
Max sequence length for outputs: 20

In this code we can say, a dictionary target_token_index is created to map each character in the target_characters set to a unique integer index. and the dictionary comprehension syntax is used to create this dictionary, with each key-value pair consisting of a character and its corresponding integer index. Next, two numpy arrays decoder_input_data and decoder_target_data are initialized to zeros, with dimensions (num_samples, max_decoder_seq_length, num_decoder_tokens). num_samples is the number of sequences in the dataset, max_decoder_seq_length is the maximum length of the output sequence, and num_decoder_tokens is the number of unique characters in the target text vocabulary. These arrays will be used to store the input and target data for the seq2seq model. The decoder_input_data array will be used as input to the decoder, while the decoder_target_data array will be used as the expected output for the decoder.

# char to index 
target_token_index = dict([(char, i) for i, char in enumerate(target_characters)])
decoder_input_data = np.zeros((y_train.shape[0], max_decoder_seq_length, num_decoder_tokens), dtype='float32')
decoder_target_data = np.zeros((y_train.shape[0], max_decoder_seq_length, num_decoder_tokens), dtype='float32')

for i, txt in enumerate(target_texts): #y_train
    target_text = txt
    for t, char in enumerate(target_text):
        # decoder_target_data is ahead of decoder_input_data by one timestep
        decoder_input_data[i, t, target_token_index[char]] = 1.
        if t > 0:
            # decoder_target_data will be ahead by one timestep
            # and will not include the start character.
            decoder_target_data[i, t - 1, target_token_index[char]] = 1.

as we see in the code is populating the `decoder_input_data` and `decoder_target_data` arrays with the one-hot encoded input and output sequences, respectively. for each sequence in `target_texts`, the `target_text` variable is set to the sequence, and for each character in `target_text`, its corresponding integer index in the `target_token_index` dictionary is retrieved and set to 1 in the appropriate position in `decoder_input_data`. for each character except the first character in `target_text`, the corresponding integer index is also set to 1 in the appropriate position in `decoder_target_data`, but at the previous time step (i.e. t-1). This is because the model is being trained to predict each character in the output sequence given the previous characters, so the input and output sequences are offset by one time step. the `enumerate` function is used to get both the index `i` of each sequence in `target_texts` and the index `t` of each character in the sequence. the one-hot encoding is set by indexing the appropriate position in the `decoder_input_data` and `decoder_target_data` arrays with the `i`, `t`, and `target_token_index[char]` values.

Here we will build the encoder by using function API to create the model.

latent_dim = 96

# define the encoder
# here we define the input shape for the image that we used in the project.
input_shape = (img_rows, seq_len*img_cols, 1)
# here set the input for the model
input_img = Input(shape=input_shape)
# set layer for convoluation and set activation function 'relu'
x = Conv2D(8, (3, 3), activation='relu', padding='same')(input_img)
# set the pooling layer.
x = MaxPooling2D((2, 2), padding='same')(x)
# set another convoluation layer 
x = Conv2D(16, (3, 3), activation='relu', padding='same')(x)
# set the another pooling for the model
x = MaxPooling2D((2, 2), padding='same')(x)
# set third layer for convoluation layer
x = Conv2D(16, (3, 3), activation='relu', padding='same')(x)
# set flatten layer
x = Flatten()(x)
# the dense layer
state_h = Dense(latent_dim)(x)
state_c = Dense(latent_dim)(x)
encoder_states = [state_h, state_c]

# here set the Model and put the input and the output
image_encoder = Model(input_img, encoder_states)

# show the summary for the model.
image_encoder.summary()

The output for the model that we will show the symmary

Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 input_1 (InputLayer)           [(None, 28, 84, 1)]  0           []                               
                                                                                                  
 conv2d (Conv2D)                (None, 28, 84, 8)    80          ['input_1[0][0]']                
                                                                                                  
 max_pooling2d (MaxPooling2D)   (None, 14, 42, 8)    0           ['conv2d[0][0]']                 
                                                                                                  
 conv2d_1 (Conv2D)              (None, 14, 42, 16)   1168        ['max_pooling2d[0][0]']          
                                                                                                  
 max_pooling2d_1 (MaxPooling2D)  (None, 7, 21, 16)   0           ['conv2d_1[0][0]']               
                                                                                                  
 conv2d_2 (Conv2D)              (None, 7, 21, 16)    2320        ['max_pooling2d_1[0][0]']        
                                                                                                  
 flatten (Flatten)              (None, 2352)         0           ['conv2d_2[0][0]']               
                                                                                                  
 dense (Dense)                  (None, 96)           225888      ['flatten[0][0]']                
                                                                                                  
 dense_1 (Dense)                (None, 96)           225888      ['flatten[0][0]']                
                                                                                                  
==================================================================================================
Total params: 455,344
Trainable params: 455,344
Non-trainable params: 0

Set the decoder for the project:

# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None, num_decoder_tokens))
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the
# return states in the training model, but we will use them in inference.
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True, name="decoder")
decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states)
decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

# Define the model that will turn
# encoder_input_data & decoder_input_data into decoder_target_data
model = Model([input_img, decoder_inputs], decoder_outputs)
print(model.summary())

output of the model:

Model: "model_1"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 input_1 (InputLayer)           [(None, 28, 84, 1)]  0           []                               
                                                                                                  
 conv2d (Conv2D)                (None, 28, 84, 8)    80          ['input_1[0][0]']                
                                                                                                  
 max_pooling2d (MaxPooling2D)   (None, 14, 42, 8)    0           ['conv2d[0][0]']                 
                                                                                                  
 conv2d_1 (Conv2D)              (None, 14, 42, 16)   1168        ['max_pooling2d[0][0]']          
                                                                                                  
 max_pooling2d_1 (MaxPooling2D)  (None, 7, 21, 16)   0           ['conv2d_1[0][0]']               
                                                                                                  
 conv2d_2 (Conv2D)              (None, 7, 21, 16)    2320        ['max_pooling2d_1[0][0]']        
                                                                                                  
 flatten (Flatten)              (None, 2352)         0           ['conv2d_2[0][0]']               
                                                                                                  
 input_2 (InputLayer)           [(None, None, 18)]   0           []                               
                                                                                                  
 dense (Dense)                  (None, 96)           225888      ['flatten[0][0]']                
                                                                                                  
 dense_1 (Dense)                (None, 96)           225888      ['flatten[0][0]']                
                                                                                                  
 decoder (LSTM)                 [(None, None, 96),   44160       ['input_2[0][0]',                
                                 (None, 96),                      'dense[0][0]',                  
                                 (None, 96)]                      'dense_1[0][0]']                
                                                                                                  
 dense_2 (Dense)                (None, None, 18)     1746        ['decoder[0][0]']                
                                                                                                  
==================================================================================================
Total params: 501,250
Trainable params: 501,250
Non-trainable params: 0
_________________________________________________________________________________________________

# model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

In the code upove we compile the model first we see the comment line we use another optimize ‘rmsprop’ and in another line we use the optimize ‘adam’ and set the loss ‘categorical crossentropy’ and set the metrics that use to check the measurement for the model.

plot_model(model, to_file="model.png", show_shapes=True)
img = Image.open('./model.png')
img

Here we plot the model we have here:

That plot for the model that we achive after create the model.

Now we go to the final step in the work that is the training the model:

# Here we fiting the model
history = model.fit([x_m_train[:10000], decoder_input_data[:10000]], decoder_target_data[:10000], batch_size=32, epochs=20, validation_split=0.2)

The output for the model the steps for each epoch:

Epoch 1/20
250/250 [==============================] - 17s 13ms/step - loss: 1.1935 - accuracy: 0.1675 - val_loss: 1.0735 - val_accuracy: 0.2173
Epoch 2/20
250/250 [==============================] - 2s 7ms/step - loss: 0.9296 - accuracy: 0.2669 - val_loss: 0.7846 - val_accuracy: 0.3374
Epoch 3/20
250/250 [==============================] - 2s 7ms/step - loss: 0.5720 - accuracy: 0.4010 - val_loss: 0.4392 - val_accuracy: 0.4446
Epoch 4/20
250/250 [==============================] - 2s 7ms/step - loss: 0.3566 - accuracy: 0.4668 - val_loss: 0.3073 - val_accuracy: 0.5024
Epoch 5/20
250/250 [==============================] - 2s 7ms/step - loss: 0.2627 - accuracy: 0.5123 - val_loss: 0.2385 - val_accuracy: 0.5187
Epoch 6/20
250/250 [==============================] - 2s 7ms/step - loss: 0.2128 - accuracy: 0.5200 - val_loss: 0.1999 - val_accuracy: 0.5218
Epoch 7/20
250/250 [==============================] - 2s 8ms/step - loss: 0.1817 - accuracy: 0.5241 - val_loss: 0.1841 - val_accuracy: 0.5256
Epoch 8/20
250/250 [==============================] - 2s 8ms/step - loss: 0.1568 - accuracy: 0.5274 - val_loss: 0.1565 - val_accuracy: 0.5279
Epoch 9/20
250/250 [==============================] - 2s 7ms/step - loss: 0.1392 - accuracy: 0.5295 - val_loss: 0.1446 - val_accuracy: 0.5296
Epoch 10/20
250/250 [==============================] - 2s 7ms/step - loss: 0.1270 - accuracy: 0.5314 - val_loss: 0.1402 - val_accuracy: 0.5300
Epoch 11/20
250/250 [==============================] - 2s 7ms/step - loss: 0.1753 - accuracy: 0.5142 - val_loss: 0.3555 - val_accuracy: 0.4637
Epoch 12/20
250/250 [==============================] - 2s 7ms/step - loss: 0.1410 - accuracy: 0.5266 - val_loss: 0.1283 - val_accuracy: 0.5324
Epoch 13/20
250/250 [==============================] - 2s 7ms/step - loss: 0.1090 - accuracy: 0.5334 - val_loss: 0.1171 - val_accuracy: 0.5335
Epoch 14/20
250/250 [==============================] - 2s 10ms/step - loss: 0.1002 - accuracy: 0.5344 - val_loss: 0.1134 - val_accuracy: 0.5332
Epoch 15/20
250/250 [==============================] - 2s 7ms/step - loss: 0.0938 - accuracy: 0.5350 - val_loss: 0.1104 - val_accuracy: 0.5329
Epoch 16/20
250/250 [==============================] - 2s 7ms/step - loss: 0.0882 - accuracy: 0.5355 - val_loss: 0.1042 - val_accuracy: 0.5338
Epoch 17/20
250/250 [==============================] - 2s 7ms/step - loss: 0.0828 - accuracy: 0.5360 - val_loss: 0.1014 - val_accuracy: 0.5337
Epoch 18/20
250/250 [==============================] - 2s 7ms/step - loss: 0.0795 - accuracy: 0.5361 - val_loss: 0.0997 - val_accuracy: 0.5335
Epoch 19/20
250/250 [==============================] - 2s 7ms/step - loss: 0.0794 - accuracy: 0.5359 - val_loss: 0.0987 - val_accuracy: 0.5336
Epoch 20/20
250/250 [==============================] - 2s 7ms/step - loss: 0.0758 - accuracy: 0.5363 - val_loss: 0.0961 - val_accuracy: 0.5340
[ ]

Here we plot the figure for the for the training model;

plt.figure(figsize=(10, 7))
# Extract the loss metrics for the model
plt.plot(history.history['loss'], label='Train')
# Extract the validation loss metrics for the model
plt.plot(history.history['val_loss'], label='Test')

# set the title for the fiure
plt.title('LSTM Learning curve')
# set the ylabel for the figure
plt.ylabel('Loss')
# set the label for the model.
plt.xlabel('Epoch')

plt.legend()

Output for the code:

this figuer for ploting the train loss and test loss

Now we can define the sample model:

# Define sampling models
encoder_model = Model(input_img, encoder_states)

decoder_state_input_h = Input(shape=(latent_dim,))
decoder_state_input_c = Input(shape=(latent_dim,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs, state_h, state_c = decoder_lstm(decoder_inputs, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model([decoder_inputs] + decoder_states_inputs, [decoder_outputs] + decoder_states)

# Reverse-lookup token index to decode sequences back to
# something readable.
reverse_target_char_index = dict((i, char) for char, i in target_token_index.items())

Here we decode image that we work on.

def decode_image(image):
    # Encode the input as state vectors.
    states_value = encoder_model.predict(image)

    # Generate empty target sequence of length 1.
    target_seq = np.zeros((1, 1, num_decoder_tokens))
    # Populate the first character of target sequence with the start character.
    target_seq[0, 0, target_token_index['\t']] = 1.

    # Sampling loop for a batch of sequences
    # (to simplify, here we assume a batch of size 1).
    stop_condition = False
    decoded_sentence = ''
    while not stop_condition:
        output_tokens, h, c = decoder_model.predict([target_seq] + states_value)

        # Sample a token
        sampled_token_index = np.argmax(output_tokens[0, -1, :])
        sampled_char = reverse_target_char_index[sampled_token_index]
        decoded_sentence += sampled_char

        # Exit condition: either hit max length
        # or find stop character.
        if (sampled_char == '\n' or
           len(decoded_sentence) > max_decoder_seq_length):
            stop_condition = True

        # Update the target sequence (of length 1).
        target_seq = np.zeros((1, 1, num_decoder_tokens))
        target_seq[0, 0, sampled_token_index] = 1.

        # Update states
        states_value = [h, c]

    return decoded_sentence

Here plot the image after tst the model:

n = 10  # how many digits we will display
plt.figure(figsize=(3, 25))
for i in range(n):
    image = x_m_train[i+2000].copy()
    ax = plt.subplot(n, 1, i + 1)
    plt.imshow(image.reshape(28, seq_len* 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)    
    plt.title(decode_image(np.expand_dims(image, axis=0)))
    
plt.show()

Output of the model:

1/1 [==============================] - 0s 19ms/step
1/1 [==============================] - 0s 18ms/step
1/1 [==============================] - 0s 22ms/step
1/1 [==============================] - 0s 22ms/step
1/1 [==============================] - 0s 20ms/step
1/1 [==============================] - 0s 19ms/step
1/1 [==============================] - 0s 18ms/step
1/1 [==============================] - 0s 23ms/step
1/1 [==============================] - 0s 19ms/step
1/1 [==============================] - 0s 19ms/step
1/1 [==============================] - 0s 23ms/step
1/1 [==============================] - 0s 18ms/step
1/1 [==============================] - 0s 27ms/step
1/1 [==============================] - 0s 19ms/step
1/1 [==============================] - 0s 19ms/step
1/1 [==============================] - 0s 20ms/step
1/1 [==============================] - 0s 19ms/step
1/1 [==============================] - 0s 18ms/step
1/1 [==============================] - 0s 19ms/step
1/1 [==============================] - 0s 20ms/step
1/1 [==============================] - 0s 22ms/step
1/1 [==============================] - 0s 19ms/step
1/1 [==============================] - 0s 19ms/step
1/1 [==============================] - 0s 20ms/step
1/1 [==============================] - 0s 26ms/step
1/1 [==============================] - 0s 18ms/step
1/1 [==============================] - 0s 18ms/step
1/1 [==============================] - 0s 19ms/step
1/1 [==============================] - 0s 22ms/step
1/1 [==============================] - 0s 20ms/step
1/1 [==============================] - 0s 19ms/step
1/1 [==============================] - 0s 21ms/step
1/1 [==============================] - 0s 19ms/step
1/1 [==============================] - 0s 18ms/step
1/1 [==============================] - 0s 18ms/step
1/1 [==============================] - 0s 18ms/step
1/1 [==============================] - 0s 19ms/step
1/1 [==============================] - 0s 20ms/step
1/1 [==============================] - 0s 20ms/step
1/1 [==============================] - 0s 19ms/step
1/1 [==============================] - 0s 20ms/step
1/1 [==============================] - 0s 19ms/step
1/1 [==============================] - 0s 22ms/step
1/1 [==============================] - 0s 22ms/step
1/1 [==============================] - 0s 23ms/step
1/1 [==============================] - 0s 28ms/step
1/1 [==============================] - 0s 18ms/step
1/1 [==============================] - 0s 23ms/step
1/1 [==============================] - 0s 20ms/step
1/1 [==============================] - 0s 21ms/step
1/1 [==============================] - 0s 21ms/step
1/1 [==============================] - 0s 23ms/step
1/1 [==============================] - 0s 21ms/step
1/1 [==============================] - 0s 19ms/step
1/1 [==============================] - 0s 18ms/step
1/1 [==============================] - 0s 20ms/step
1/1 [==============================] - 0s 24ms/step
1/1 [==============================] - 0s 20ms/step
1/1 [==============================] - 0s 17ms/step
1/1 [==============================] - 0s 19ms/step
1/1 [==============================] - 0s 18ms/step
1/1 [==============================] - 0s 25ms/step
1/1 [==============================] - 0s 21ms/step
1/1 [==============================] - 0s 22ms/step
1/1 [==============================] - 0s 20ms/step
1/1 [==============================] - 0s 22ms/step
1/1 [==============================] - 0s 20ms/step
1/1 [==============================] - 0s 19ms/step
1/1 [==============================] - 0s 19ms/step
1/1 [==============================] - 0s 19ms/step
1/1 [==============================] - 0s 19ms/step
1/1 [==============================] - 0s 19ms/step
1/1 [==============================] - 0s 20ms/step
1/1 [==============================] - 0s 20ms/step
1/1 [==============================] - 0s 17ms/step
1/1 [==============================] - 0s 19ms/step
1/1 [==============================] - 0s 21ms/step
1/1 [==============================] - 0s 19ms/step
1/1 [==============================] - 0s 39ms/step
1/1 [==============================] - 0s 32ms/step
1/1 [==============================] - 0s 29ms/step
1/1 [==============================] - 0s 28ms/step
1/1 [==============================] - 0s 31ms/step
1/1 [==============================] - 0s 27ms/step
1/1 [==============================] - 0s 32ms/step
1/1 [==============================] - 0s 32ms/step
1/1 [==============================] - 0s 29ms/step
1/1 [==============================] - 0s 30ms/step
1/1 [==============================] - 0s 28ms/step
1/1 [==============================] - 0s 41ms/step
1/1 [==============================] - 0s 38ms/step
1/1 [==============================] - 0s 31ms/step
1/1 [==============================] - 0s 28ms/step
1/1 [==============================] - 0s 33ms/step
1/1 [==============================] - 0s 27ms/step
1/1 [==============================] - 0s 27ms/step
1/1 [==============================] - 0s 29ms/step
1/1 [==============================] - 0s 28ms/step
1/1 [==============================] - 0s 32ms/step
1/1 [==============================] - 0s 32ms/step
1/1 [==============================] - 0s 28ms/step
1/1 [==============================] - 0s 29ms/step
1/1 [==============================] - 0s 28ms/step
1/1 [==============================] - 0s 20ms/step
1/1 [==============================] - 0s 21ms/step
1/1 [==============================] - 0s 21ms/step
1/1 [==============================] - 0s 18ms/step
1/1 [==============================] - 0s 20ms/step
1/1 [==============================] - 0s 17ms/step
1/1 [==============================] - 0s 20ms/step
1/1 [==============================] - 0s 19ms/step
1/1 [==============================] - 0s 21ms/step
1/1 [==============================] - 0s 21ms/step
1/1 [==============================] - 0s 22ms/step
1/1 [==============================] - 0s 85ms/step
1/1 [==============================] - 0s 27ms/step
1/1 [==============================] - 0s 21ms/step
1/1 [==============================] - 0s 20ms/step
1/1 [==============================] - 0s 21ms/step
1/1 [==============================] - 0s 19ms/step
1/1 [==============================] - 0s 19ms/step
1/1 [==============================] - 0s 20ms/step
1/1 [==============================] - 0s 19ms/step
1/1 [==============================] - 0s 17ms/step
1/1 [==============================] - 0s 71ms/step
1/1 [==============================] - 0s 96ms/step
1/1 [==============================] - 0s 21ms/step
1/1 [==============================] - 0s 19ms/step
1/1 [==============================] - 0s 19ms/step
1/1 [==============================] - 0s 19ms/step

Conclusions

In this article, I worked to set a new idea that shows us how to make connections between computer vision and NLP, and also implement the code to show how it works, and as we see the result is very good.
My regards.

How to connect computer vision and natural language processing

Introduction

What is the Computer visoin?

Implmentation

Here we will build the encoder by using function API to create the model.

Conclusions

Written by Mohamed Bakrey Mahmoud

No responses yet