strangerRidingCaml

4. Sequence-to-Sequence Models 본문

NLP

4. Sequence-to-Sequence Models

woddlwoddl 2024. 5. 5. 15:58
728x90
Sequence-to-Sequence Models

Sequence-to-Sequence Models

Introduction to sequence-to-sequence (Seq2Seq) architecture:
Sequence-to-sequence (Seq2Seq) models are a type of neural network architecture designed to handle input and output sequences of variable lengths. They consist of an encoder network that processes the input sequence and a decoder network that generates the output sequence. Seq2Seq models are widely used for tasks such as machine translation, summarization, and speech recognition.

Applications of Seq2Seq models:
Seq2Seq models have numerous applications, including:

  • Machine translation: Translating text from one language to another.
  • Summarization: Generating concise summaries of longer text documents.
  • Speech recognition: Converting spoken language into text.
  • Chatbots: Generating responses to user queries in natural language.

Attention mechanism:
The attention mechanism is a component of Seq2Seq models that allows the decoder to focus on specific parts of the input sequence when generating the output sequence. It enables the model to dynamically weigh the importance of different input tokens at each decoding step, improving the model's ability to capture long-range dependencies and handle variable-length sequences effectively.

Lab Activity: Implementing a simple Seq2Seq model for machine translation
In this lab activity, we will implement a simple Seq2Seq model using TensorFlow/Keras for machine translation. The steps involved include:

  1. Preprocess the text data: Tokenization, sequence generation.
  2. Build and train the Seq2Seq model: Define the encoder and decoder architectures and train the model on the preprocessed data.
  3. Translate input text using the trained model: Convert input text from one language to another using the trained model.

Code for Lab Activity:
Step 1: Preprocess the text data


        # Import necessary libraries
        import numpy as np
        from tensorflow.keras.preprocessing.text import Tokenizer
        from tensorflow.keras.preprocessing.sequence import pad_sequences

        # Sample text data
        input_texts = ["I love coding", "Machine learning is fascinating", "Seq2Seq models are powerful"]
        target_texts = ["J'aime coder", "L'apprentissage automatique est fascinant", "Les modèles Seq2Seq sont puissants"]

        # Tokenization
        input_tokenizer = Tokenizer()
        input_tokenizer.fit_on_texts(input_texts)
        target_tokenizer = Tokenizer()
        target_tokenizer.fit_on_texts(target_texts)

        # Generate input-output sequences
        input_sequences = input_tokenizer.texts_to_sequences(input_texts)
        target_sequences = target_tokenizer.texts_to_sequences(target_texts)

        # Padding sequences
        max_input_length = max([len(seq) for seq in input_sequences])
        max_target_length = max([len(seq) for seq in target_sequences])
        X = pad_sequences(input_sequences, maxlen=max_input_length, padding='post')
        y = pad_sequences(target_sequences, maxlen=max_target_length, padding='post')
    
Step 2: Build and train the Seq2Seq model

        from tensorflow.keras.models import Model
        from tensorflow.keras.layers import Input, LSTM, Dense

        # Define encoder architecture
        encoder_inputs = Input(shape=(max_input_length,))
        encoder_lstm = LSTM(256, return_state=True)
        encoder_outputs, state_h, state_c = encoder_lstm(encoder_inputs)
        encoder_states = [state_h, state_c]

        # Define decoder architecture
        decoder_inputs = Input(shape=(max_target_length,))
        decoder_lstm = LSTM(256, return_sequences=True, return_state=True)
        decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states)
        decoder_dense = Dense(len(target_tokenizer.word_index)+1, activation='softmax')
        decoder_outputs = decoder_dense(decoder_outputs)

        # Compile model
        model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
        model.compile(optimizer='adam', loss='sparse_categorical_crossentropy')

        # Train model
        model.fit([X, y[:,:-1]], y[:,1:], epochs=50, verbose=0)
    
Step 3: Translate input text using the trained model

        # Translate input text using the trained model
        def translate(input_text):
            input_seq = input_tokenizer.texts_to_sequences([input_text])
            input_seq = pad_sequences(input_seq, maxlen=max_input_length, padding='post')
            output_seq = np.zeros((1, max_target_length))
            output_seq[0, 0] = target_tokenizer.word_index['']
            for i in range(1, max_target_length):
                predictions = model.predict([input_seq, output_seq])
                predicted_index = np.argmax(predictions[0, i-1, :])
                output_seq[0, i] = predicted_index
                if predicted_index == target_tokenizer.word_index['']:
                    break
            translated_text = ""
            for index in output_seq[0]:
                word = ""
                for word_, index_ in target_tokenizer.word_index.items():
                    if index_ == index:
                        word = word_
                        break
                if word != '' and word != '' and word != '':
                    translated_text += word + " "
            return translated_text.strip()

        # Test translation
        input_text = "I love coding"
        translated_text = translate(input_text)
        print("Input Text:", input_text)
        print("Translated Text:", translated_text)
    

'NLP' 카테고리의 다른 글

6. NLP Applications and Advanced Topics  (0) 2024.05.05
5. Advanced NLP Models  (0) 2024.05.05
3. Language Modeling  (0) 2024.05.05
2. Text Representation  (0) 2024.05.05
1. Introduction to NLP  (0) 2024.05.05