𝘚𝘭𝘰𝘸 𝘣𝘢𝘡 𝘴𝘡𝘦𝘒π˜₯𝘺

[LLM] ν™”μƒνšŒμ˜ 쀑 STT to TTS μˆ˜ν–‰ν•˜λŠ” μ‹œμŠ€ν…œ 섀계 - 1. OpenAI API 'Whisper-1' ν™œμš©ν•˜μ—¬ μ‹€μ‹œκ°„ STT κ΅¬ν˜„ λ³Έλ¬Έ

machine learning/LLM

[LLM] ν™”μƒνšŒμ˜ 쀑 STT to TTS μˆ˜ν–‰ν•˜λŠ” μ‹œμŠ€ν…œ 섀계 - 1. OpenAI API 'Whisper-1' ν™œμš©ν•˜μ—¬ μ‹€μ‹œκ°„ STT κ΅¬ν˜„

.23 2025. 4. 18. 02:17

πŸ—£οΈ STT?

Speech-To-Text 둜, μŒμ„±μ„ ν…μŠ€νŠΈλ‘œ λ³€ν™˜ν•˜λŠ” μž‘μ—…μ΄λ‹€.

κ·Έκ±Έ λˆ„κ°€λͺ¨λ¦…λ‹ˆκΉŒ

 

'본사와 ν˜„μ§€ 곡μž₯ κ°„ νŠΈλŸ¬λΈ” 처리 지원 μ‹œμŠ€ν…œ' 쀑 ν™”μƒνšŒμ˜ λ™μ•ˆμ˜ μ‹€μ‹œκ°„ STT - λ²ˆμ—­ - TTS νŒŒμ΄ν”„λΌμΈ 섀계와 κ΅¬ν˜„μ„ λ‹΄λ‹Ήν•˜κ²Œ λ˜μ—ˆλŠ”λ°, (사싀 STT μž¬λ―Έμžˆμ–΄λ³΄μ—¬μ„œ λ‚΄κ°€ ν•˜κ³ μ‹Άλ‹€κ³  μžμ›ν•¨) κ·Έ 쀑 'μ‹€μ‹œκ°„ STT μ‹œμŠ€ν…œ'λΆ€ν„° μž‘κ²Œ κ΅¬ν˜„ν•΄λ³΄κ³ μž κ°„λ‹¨ν•œ(μ½”λ“œ1823641μ€„μ§œλ¦¬) ν”„λ‘œκ·Έλž¨μ„ μ„€κ³„ν•˜κ²Œ λ˜μ—ˆλ‹€.

 

μ‚¬μš© λͺ¨λΈ

ν˜„μž¬ λ“£κ³  μžˆλŠ” κ΅μœ‘μ—μ„œ OpenAI APIλ₯Ό μ‚¬μš©ν•  수 μžˆλ„λ‘ keyλ₯Ό μ œκ³΅ν•΄μ€¬κΈ° λ•Œλ¬Έμ—(슀칼라 μ§±), sttλͺ¨λΈ μ€‘μ—μ„œλ„ 토큰 νš¨μœ¨μ μ΄λ©΄μ„œ κ½€ μ˜ˆμ „μ— λ‚˜μ™€ μ½”λ”© μ°Έκ³  μžλ£Œκ°€ λ§Žμ€ 'whisper-1'을 μ‚¬μš©ν•˜κ²Œ λ˜μ—ˆλ‹€.

 

πŸ”— Whisper-1 곡식 API λ¬Έμ„œ: https://platform.openai.com/docs/models/whisper-1

 

Google, 파파고, AWS λ“±μ—μ„œ STTλ₯Ό μœ„ν•œ λ‹€μ–‘ν•œ APIλ₯Ό μ§€μ›ν•˜λŠ”κ±Έ μ•Œκ³ λŠ” μžˆμ—ˆκ³ ,

OpenAI λͺ¨λΈ μ€‘μ—μ„œλ„ 비ꡐ적 μ΅œκ·Όμ— λ‚˜μ˜¨ 'GPT-4o-Audio' λ“± λ‹€λ₯Έ νŒ€λ“€ν•œν…Œ 이것저것 μ£Όμ›Œλ“€μ€ 것은 λ§Žμ€λ°..

 

μ•„λ¬΄νŠΌ OpenAI API 맘껏 쓰라고 λ– λ¨Ήμ—¬μ£ΌλŠ”κ²ƒκ³Ό 닀름 μ—†λŠ” ν™˜κ²½μ„ μ΅œλŒ€ν•œ ν™œμš©ν•˜κ³  싢기도 ν•˜κ³ ,

'닀ꡭ어지원' + 'μ‹€μ‹œκ°„μ„±' + 'κ·Έλ‹₯ μ–‘ν˜Έν•˜μ§€ μ•Šμ€ 둜컬 ν…ŒμŠ€νŠΈ ν™˜κ²½' 을 λͺ¨λ‘ κ³ λ €ν•˜μ—¬

 

κ°€λ²Όμš΄ μ˜€ν”ˆμ†ŒμŠ€ λͺ¨λΈ μ•„λ‹ˆλ©΄ μ£Όμ–΄μ§„ ν™˜κ²½μ„ μ΅œλŒ€ν•œ ν™œμš©ν•  수 μžˆλŠ” OpenAI API λͺ¨λΈ 쀑 더 잘 λ˜λŠ”κ±Έ μ“°κ³ μž ν–ˆλ‹€.

 

κ·ΈλŸ¬λ‚˜..

 

GPT-4o-AudioλŠ” μš°μ„  토큰이 말도 μ•ˆλ˜κ²Œ λΉ„μŒŒλ‹€.

아무리 λ‚¨μ˜ λˆμ΄λΌμ§€λ§Œ

양심에 μ°”λ¦ΌπŸ‘‰πŸ‘ˆ

 

opensource whisper https://github.com/openai/whisper

λ˜ν•œ μ‹€μ‹œκ°„μ„±μ„ κ³ λ €ν•˜μ—¬ μ˜€ν”ˆμ†ŒμŠ€ whisper λͺ¨λΈμ„ μ‚¬μš©ν•˜λ €κ³  ν–ˆλ”λ‹ˆ 'small'λΆ€ν„°λŠ” 느리고 + 컴퓨터가 ν„°μ§€λ €ν•˜κ³ (M1 ν•™λŒ€ γ…œγ…œ) 'base' λŠ” μ„±λŠ₯이 λ„ˆλ¬΄ λ–¨μ–΄μ‘Œλ‹€.

였λ₯˜λŠ” λ¬΄μ‹œ λΆ€νƒλ“œλ¦½λ‹ˆλ‹€πŸ˜©

 

ν”„λ‘œμ νŠΈ 쀑간에 μ‹€μ œλ‘œ whisper baseλͺ¨λΈμ„ ν™œμš©ν•œ stt to tts μ‹€μŠ΅λ„ μ§„ν–‰ν•œ 적이 μžˆμ—ˆλŠ”λ°...

 

mood
μ’‹μ•„ν•˜λŠ” ν•œκ΅­ 라면? 이라고 λ§ν–ˆλ‹€.

 

이건 μ§‘μ—μ„œ μ‘°μš©ν• λ•Œ ν…ŒμŠ€νŠΈν•΄λ³Έκ±°κ³ ..

μ†ŒμŒμ΄ μ–΄λŠμ •λ„ μžˆλŠ” κ΅μœ‘ν™˜κ²½μ—μ„œ ν…ŒμŠ€νŠΈ ν•΄λ³Όλ•ŒλŠ” ν•œκΈ€λ‘œ λ²ˆμ—­μ΄ μ•ˆλ˜κ³  μ΄μƒν•œ μ–Έμ–΄λ‘œ νŠ€μ–΄λ²„λ¦¬λ”λΌ

 

λ˜λ‹€λ₯Έ μ˜΅μ…˜μœΌλ‘œ κ²½λŸ‰ν™”λœ 버전인 faster_whisper도 μžˆμ—ˆμœΌλ‚˜..

GPUκ°€ μ—†μ–΄μ„œ κ·ΈλŸ°κ°€ API μ“°λŠ”κ±°λ³΄λ‹€ 더 λŠλ Έλ‹€.(30μ΄ˆμ •λ„) + μ§„μ§œ λ…ΈνŠΈλΆ ν„°μ§€λŠ” 쀄 μ•Œμ•˜μŒ.

μ €λŠ” 엉덩이라고 λ§ν•œ 적이 μ—†μŠ΅λ‹ˆλ‹€.

μ΄λ”΄κ²Œ STT?

 

κ·Έλ‚˜λ§ˆ μ‚¬λžŒλ‹€μš΄ 말을 λ½‘μ•„μ£Όλ˜κ²Œ APIμ˜€κΈ° λ•Œλ¬Έμ— .. κ²°κ΅­ Whisper-1 승.

 


λͺ©μ 

ν•΄λ‹Ή ν”„λ‘œμ νŠΈ μžμ²΄κ°€ ν™”μƒνšŒμ˜μ—μ„œ μ§„ν–‰λ˜λŠ” 것이기 λ•Œλ¬Έμ—, ν™”μƒνšŒμ˜μ—μ„œ λ°œμƒν•  수 μžˆλŠ” μ œμ•½μ‚¬ν•­μ„ κ³ λ €ν•œ STT μ‹œμŠ€ν…œ κ΅¬ν˜„μ΄ ν•„μš”ν–ˆλ‹€.

 

 

ν™”μƒνšŒμ˜μ΄κΈ° λ•Œλ¬Έμ—, 무엇보닀 '길이와 상관 μ—†λŠ”' μ‹€μ‹œκ°„μ„±μ΄ μ€‘μš”ν•˜λ‹€.

κ·Έ 이후 진행될 λ²ˆμ—­κ³Ό TTS μž‘μ—…μ˜ latencyλ₯Ό μƒκ°ν–ˆμ„ λ•Œ 말이 λ‹€ λλ‚œ ν›„ 전체 λ¬Έμž₯을 μ„œλ²„λ‘œ μ „λ‹¬ν•˜λŠ” 것은 μ˜λ―Έμ—†λ‹€ μƒκ°ν•΄μ„œ λ‹€μŒκ³Ό 같은 ν”„λ‘œμ„ΈμŠ€λ‘œ μ§„ν–‰λ˜λŠ” μ‹œμŠ€ν…œμ„ κ΅¬ν˜„ν•˜κ³ μž ν–ˆλ‹€.:

 

1. λ©€ν‹°μŠ€λ ˆλ“œλ₯Ό ν™œμš©ν•˜μ—¬ μ˜€λ””μ˜€λ₯Ό μˆ˜μ§‘ν•˜λŠ” task, 전사λ₯Ό μˆ˜ν–‰ν•˜λŠ” taskλ₯Ό μŠ€λ ˆλ“œλ‘œ λ‚˜λˆ„μ–΄ κ΅¬ν˜„ν•œλ‹€.

2. μ˜€λ””μ˜€ μŠ€λ ˆλ“œμ—μ„œλŠ” κ·Έλ•Œκ·Έλ•Œ λ…ΉμŒν•œ 데이터λ₯Ό 0.5초 ~ 1초 μ‚¬μ΄μ˜ chunk λ‹¨μœ„λ‘œ 전사λ₯Ό μˆ˜ν–‰ν•΄μ€„ queue에 μŒ“μ•„μ€€λ‹€.

3. 전사λ₯Ό μˆ˜ν–‰ν•  queue에 데이터가 λ“€μ–΄μ˜€λ©΄ λ°”λ‘œλ°”λ‘œ 전사λ₯Ό μˆ˜ν–‰ν•œλ‹€.

4. 말이 λ‹€ λλ‚˜λ©΄ λ¬Έμž₯ 사이 침묡을 κ°μ§€ν•˜λ©΄ κ·Έλ•ŒκΉŒμ§€ μ™„μ„±λœ 전사 κ²°κ³Όλ₯Ό ν•œ λ¬Έμž₯으둜 κ°„μ£Όν•˜μ—¬ 이후 λ²ˆμ—­ task둜 λ„˜κ²¨μ€€λ‹€.

 

 

μ΄λ ‡κ²ŒκΉŒμ§€ ν•΄μ„œ μ½”λ“œμ˜ 기초λ₯Ό μ§°κ³ ,

GPT의 도움을 (많이) λ°›μ•„ μ½˜μ†”λ‘œ ν™•μΈν•˜λŠ” μ½”λ“œκΉŒμ§€ κ΅¬ν˜„ν•  수 μžˆμ—ˆλ‹€.

 

μ½”λ“œ

 

main

if __name__ == "__main__":
    try:
        clear_screen()

        t1 = threading.Thread(target=audio_collection_thread)
        t2 = threading.Thread(target=stt_processing_thread)

        t1.daemon = True
        t2.daemon = True

        t1.start()
        t2.start()

        update_captions()

        while True:
            time.sleep(0.1)

    except KeyboardInterrupt:
        clear_screen()
        print("\nπŸ›‘ ν”„λ‘œκ·Έλž¨ μ’…λ£Œ...")
        time.sleep(0.5)
        print("πŸ‘‹ μ’…λ£Œ μ™„λ£Œ")

 

1λ²ˆμ—μ„œ λ§ν–ˆλ“―, μ‹€μ‹œκ°„μ„± κ΅¬ν˜„μ„ μœ„ν•΄ audio_collection_thread μ—μ„œλŠ” μŒμ„±μ„ μˆ˜μ§‘ν•˜κ³  stt_processing_threadμ—μ„œλŠ” 전사(STT)λ₯Ό μˆ˜ν–‰ν•˜λŠ” ν•¨μˆ˜λ₯Ό κ΅¬ν˜„ν•΄μ€¬λ‹€.

ν”„λ‘œκ·Έλž¨ μ’…λ£Œ μ‹œ λͺ¨λ“  μŠ€λ ˆλ“œλ„ 그에 따라 taskλ₯Ό μ’…λ£Œν•΄μ£ΌκΈ° μœ„ν•΄ daemon 섀정을 ν•΄μ£Όκ³ ,

μ’…λ£Œλ˜κΈ° μ „κΉŒμ§€ 0.1μ΄ˆλ§ˆλ‹€ μƒˆλ‘œ 뢈러올 수 있게 sleep을 κ±Έμ–΄μ£Όλ©° μ‹€μ‹œκ°„ STTλ₯Ό μˆ˜ν–‰ν•œλ‹€.

 

stt_processing_thread

# STT 처리 μŠ€λ ˆλ“œ (OpenAI Whisper API μ΅œμ‹  버전)
def stt_processing_thread():
    global current_caption
    buffer = np.zeros((0, 1), dtype=np.float32)
    max_buffer_size = samplerate * 5

    try:
        while True:
            try:
                data = audio_queue.get(timeout=1)
                buffer = np.concatenate((buffer, data), axis=0)

                if len(buffer) > max_buffer_size:
                    buffer = buffer[-max_buffer_size:]

                chunk_size = int(samplerate * 3.0)
                if len(buffer) >= chunk_size:
                    with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as f:
                        sf.write(f.name, buffer[:chunk_size], samplerate)
                        audio_file = open(f.name, "rb")
                        response = client.audio.transcriptions.create(
                            model="whisper-1",
                            file=audio_file,
                            language="ko"
                        ,
                            prompt="회의 μ€‘μž…λ‹ˆλ‹€. λ˜λ°•λ˜λ°• λ§ν•˜λŠ” λ‚΄μš©μ„ 받아적어.")
                        audio_file.close()
                        os.unlink(f.name)

                    text = response.text.strip()
                    if text:
                        with caption_lock:
                            if not current_caption or text[0].isupper() or any(current_caption.endswith(p) for p in ['.', '!', '?', '。', '!', '?']):
                                if current_caption:
                                    caption_history.append(current_caption)
                                current_caption = text
                            else:
                                current_caption += " " + text
                        update_captions()
                        buffer = np.zeros((0, 1), dtype=np.float32)

                audio_queue.task_done()
            except queue.Empty:
                continue
    except KeyboardInterrupt:
        pass

 

사싀 μ—¬κΈ°μ„œλŠ” 침묡을 κ°μ§€ν•œλ‹€κΈ°λ³΄λ‹€ μ˜€λ””μ˜€λ₯Ό 계속 μ—΄μ–΄λ‘λ©΄μ„œ

(sample rate * 3) λ§ŒνΌμ”©, 즉 3초 λΆ„λŸ‰λ§ŒνΌ 버퍼 λ³€μˆ˜(buffer)에 계속 chunkλ₯Ό μŒ“μ€ ν›„ 전사λ₯Ό μ§„ν–‰ν•œλ‹€. (overflowλ₯Ό λ°©μ§€ν•˜κΈ° μœ„ν•΄ maxν¬κΈ°λŠ” 5초 λΆ„λŸ‰μœΌλ‘œ 선언을 해놓고, 3μ΄ˆμ”©λ§Œ λͺ¨μ•„λ‘μ—ˆλ‹€ 방사)

 

전사λ₯Ό μ§„ν–‰ν•˜λŠ” 방식은

.wav μž„μ‹œνŒŒμΌμ„ λ§Œλ“€μ–΄μ„œ ν•΄λ‹Ή μŒμ„±μ„ 'whisper-1' λͺ¨λΈμ— 전달해주고,

text에 λͺ¨λΈ response의 text(λ‹΅λ³€)λΆ€λΆ„λ§Œ μ €μž₯ν•˜κ³ ,

μž„μ‹œνŒŒμΌμ„ μ§€μš°λŠ” μ‹μœΌλ‘œ μ§„ν–‰λœλ‹€.

 

참쉽죠?

 

κ·Έ μ•„λž˜λŠ” 좜λ ₯을 μœ„ν•΄ 락걸어놓은 μŠ€λ ˆλ“œμ— μ ‘κ·Όν•΄μ„œ 전사 κ²°κ³Όλ₯Ό κΈ°λ‘ν•˜λŠ” ν…μŠ€νŠΈλ₯Ό μΆ”κ°€ν•΄μ£ΌλŠ” μž‘μ—…μ΄λ‹€.

 

audio_collection_thread

# μ˜€λ””μ˜€ μˆ˜μ§‘ μŠ€λ ˆλ“œ
def audio_collection_thread():
    try:
        with sd.InputStream(samplerate=samplerate, channels=1, 
                          callback=audio_callback, blocksize=block_size):
            print("πŸŽ™οΈ μ‹€μ‹œκ°„ STT μ‹œμž‘ 쀑... μž μ‹œλ§Œ κΈ°λ‹€λ €μ£Όμ„Έμš”.")
            while True:
                time.sleep(0.1)
    except Exception as e:
        print(f"μ˜€λ””μ˜€ 슀트림 였λ₯˜: {e}", file=sys.stderr)
    except KeyboardInterrupt:
        pass

 

이건 사싀..

κ·Έλƒ₯ μ˜€λ””μ˜€ μ—΄μ–΄μ„œ κΈ°λ‘ν•˜λŠ” ν•¨μˆ˜μ΄λ‹€.

 

mac ν™˜κ²½μ—μ„œ μ§„ν–‰ν–ˆμ„λ•ŒλŠ”,

sounddevice μ°Ύμ•„μ£ΌλŠ”κ²Œ 제일 νž˜λ“€μ—ˆλ‹€

 

전체 μ½”λ“œ
import sounddevice as sd
import numpy as np
from openai import OpenAI
import queue
import threading
import time
import os
import sys
import tempfile
import soundfile as sf
from collections import deque
from dotenv import load_dotenv

load_dotenv()

os.environ["OPEN_API_KEY"] = os.getenv("OPENAI_API_KEY")

# OpenAI Whisper API ν΄λΌμ΄μ–ΈνŠΈ 생성
client = OpenAI()

# 큐 μ„€μ •
audio_queue = queue.Queue()

# μ˜€λ””μ˜€ μ„€μ •
samplerate = 16000
block_size = 4000  # 0.25초 λΆ„λŸ‰

# μžλ§‰ 관리λ₯Ό μœ„ν•œ μ„€μ •
caption_history = deque(maxlen=5)  # 졜근 5개 λ¬Έμž₯ μ €μž₯
current_caption = ""
caption_lock = threading.Lock()

# μ˜€λ””μ˜€ 콜백
def audio_callback(indata, frames, time, status):
    if status:
        print(f"μƒνƒœ: {status}", file=sys.stderr)
    audio_queue.put(indata.copy())

# ν™”λ©΄ μ§€μš°κΈ° ν•¨μˆ˜
def clear_screen():
    os.system('cls' if os.name == 'nt' else 'clear')

# μžλ§‰ 좜λ ₯ ν•¨μˆ˜
def update_captions():
    clear_screen()
    print("\n\n\n")
    print("=" * 60)
    print("πŸŽ™οΈ μ‹€μ‹œκ°„ μŒμ„± 인식 μžλ§‰ (Ctrl+C둜 μ’…λ£Œ)")
    print("=" * 60)

    for prev in list(caption_history)[:-1]:
        print(f"\033[90m{prev}\033[0m")

    if caption_history:
        print(list(caption_history)[-1])

    if current_caption:
        print(f"\033[1m{current_caption}\033[0m", end="β–‹\n")
    else:
        print("β–‹")
    print("=" * 60)

# μ˜€λ””μ˜€ μˆ˜μ§‘ μŠ€λ ˆλ“œ
def audio_collection_thread():
    try:
        with sd.InputStream(samplerate=samplerate, channels=1, 
                          callback=audio_callback, blocksize=block_size):
            print("πŸŽ™οΈ μ‹€μ‹œκ°„ STT μ‹œμž‘ 쀑... μž μ‹œλ§Œ κΈ°λ‹€λ €μ£Όμ„Έμš”.")
            while True:
                time.sleep(0.1)
    except Exception as e:
        print(f"μ˜€λ””μ˜€ 슀트림 였λ₯˜: {e}", file=sys.stderr)
    except KeyboardInterrupt:
        pass

# STT 처리 μŠ€λ ˆλ“œ (OpenAI Whisper API μ΅œμ‹  버전)
def stt_processing_thread():
    global current_caption
    buffer = np.zeros((0, 1), dtype=np.float32)
    max_buffer_size = samplerate * 5

    try:
        while True:
            try:
                data = audio_queue.get(timeout=1)
                buffer = np.concatenate((buffer, data), axis=0)

                if len(buffer) > max_buffer_size:
                    buffer = buffer[-max_buffer_size:]

                chunk_size = int(samplerate * 3.0)
                if len(buffer) >= chunk_size:
                    with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as f:
                        sf.write(f.name, buffer[:chunk_size], samplerate)
                        audio_file = open(f.name, "rb")
                        response = client.audio.transcriptions.create(
                            model="whisper-1",
                            file=audio_file,
                            language="ko"
                        ,
                            prompt="회의 μ€‘μž…λ‹ˆλ‹€. λ˜λ°•λ˜λ°• λ§ν•˜λŠ” λ‚΄μš©μ„ 받아적어.")
                        audio_file.close()
                        os.unlink(f.name)

                    text = response.text.strip()
                    if text:
                        with caption_lock:
                            if not current_caption or text[0].isupper() or any(current_caption.endswith(p) for p in ['.', '!', '?', '。', '!', '?']):
                                if current_caption:
                                    caption_history.append(current_caption)
                                current_caption = text
                            else:
                                current_caption += " " + text
                        update_captions()
                        buffer = np.zeros((0, 1), dtype=np.float32)

                audio_queue.task_done()
            except queue.Empty:
                continue
    except KeyboardInterrupt:
        pass

# 메인 μ‹€ν–‰
if __name__ == "__main__":
    try:
        clear_screen()

        t1 = threading.Thread(target=audio_collection_thread)
        t2 = threading.Thread(target=stt_processing_thread)

        t1.daemon = True
        t2.daemon = True

        t1.start()
        t2.start()

        update_captions()

        while True:
            time.sleep(0.1)

    except KeyboardInterrupt:
        clear_screen()
        print("\nπŸ›‘ ν”„λ‘œκ·Έλž¨ μ’…λ£Œ...")
        time.sleep(0.5)
        print("πŸ‘‹ μ’…λ£Œ μ™„λ£Œ")

 


μ‹€ν–‰ κ²°κ³Ό

ν…ŒμŠ€νŠΈν• λ•Œ 보톡 λˆˆμ— λ³΄μ΄λŠ” 아무 κ³΅λΆ€μžλ£Œλ₯Ό λ³΄λ©΄μ„œ μ½λŠ” 편인데 πŸ˜…

μš΄μ˜μ²΄μ œμ— λŒ€ν•œ 글을 μ½μ—ˆμ„ λ•Œμ˜ μ½”λ“œ μ‹€ν–‰ 결과이닀.

 

μ•„λ¬΄λž˜λ„ λ‚˜μ˜ 말을 3초 λ‹¨μœ„λ‘œ κ°€μ Έμ™€μ„œ 전사λ₯Ό ν•˜κΈ°λ„ ν•˜κ³ , 인곡지λŠ₯ λͺ¨λΈμ΄λΌ 뒀에 말을 μ–΄λŠμ •λ„ μ˜ˆμƒν•΄μ„œ μž‘μ„±ν•˜κΈ°λ„ ν•˜μ—¬ μ €λ ‡κ²Œ 혼자 λ§ˆλ¬΄λ¦¬ν•΄λ²„λ¦¬λŠ” κ²½μš°λ„ λ°œμƒν•œλ‹€.

 

 

μ½˜μ†” 좜λ ₯은 GPTκ°€ CLI에 예쁘게 보일 수 있게 색상을 μž…ν˜€μ„œ μž‘μ„±ν•΄μ€¬λŠ”λ°,

{prev}와 {current_caption}을 톡해 ν˜„μž¬ λ§ν•˜λŠ” λ¬Έμž₯을 κ΅¬λΆ„ν•˜μ—¬ 이전에 μ²˜λ¦¬ν•œ λ¬Έμž₯은 νšŒμƒ‰μœΌλ‘œ 보이게 해쀬닀.

 

 

κ·ΈλŸ¬λ‚˜....

λ‚˜λ₯Ό κ°€μž₯ κ³ μƒμ‹œμΌ°λ˜ 문제

 

λ°”λ‘œ

 

μ‹œμ²­ν•΄μ£Όμ…”μ„œ κ°μ‚¬ν•©λ‹ˆλ‹€.

정적이 κΈΈμ–΄μ§ˆ λ•Œ,

정적을 μ •μ μœΌλ‘œ μΈμ‹ν•˜μ§€ μ•Šκ³  μ†ŒμŒμ΄ μ˜€λ””μ˜€μ— 캑쳐되면

λ‚˜λŠ” λ§ν•œμ λ„ μ—†λŠ” 'μ‹œμ²­ν•΄μ£Όμ…”μ„œ κ°μ‚¬ν•©λ‹ˆλ‹€.' 와 같은 μœ νŠœλΈŒμ™€ κ΄€λ ¨λœ λžœλ€ν•œ 문ꡬ가 λœ¬λ‹€λŠ” 것.

 

μ΄μœ λŠ”

ν•œκ΅­μ–΄ μŒμ„± λͺ¨λΈμ„ ν•™μŠ΅ν•  λ•Œ 주둜 ν•œκ΅­μ–΄ μ±„λ„λ“€μ˜ 유튜브 - 유튜브 슀크립트 쌍으둜 ν•™μŠ΅μ„ μ§„ν–‰ν–ˆκΈ° λ•Œλ¬Έμ— μ €λŸ° 문ꡬ가 좜λ ₯λœλ‹€λŠ” 것..

 

이후에 ν…ŒμŠ€νŠΈν• λ•Œλ„ μ €κ±°λ•Œλ¬Έμ— 고생 λ§Žμ΄ν–ˆλ‹€..γ… γ… 

 

해결방법은

 

이후에 chunk λ°μ΄ν„°λ‘œλΆ€ν„° 일정 frequency μ΄μƒμ˜ κ°’λ§Œ μΈμ‹ν•˜λ„λ‘ μž„κ³„μΉ˜λ₯Ό μž‘μ•„μ£Όκ±°λ‚˜(잘 λ¨Ήνžˆμ§„ μ•ŠμŒ),

전사 단계 말고 μ’€ 더 κ³ λ„ν™”λœ λͺ¨λΈμ„ μ“°λŠ” λ²ˆμ—­ λ‹¨κ³„μ—μ„œ μž„μ˜λ‘œ μ „λ‹¬λœ λ§₯락을 νŒŒμ•…ν•΄μ„œ λ²ˆμ—­ μ²˜λ¦¬ν•΄λ‹¬λΌκ³  ν”„λ‘¬ν”„νŠΈλ₯Ό μž‘μ„±ν•˜λŠ” 방법

등등이 μžˆμ„ 수 μžˆλ‹€..

 

 

λ‚˜λŠ” λ²ˆμ—­λ‹¨κ³„μ—μ„œ ν”„λ‘¬ν”„νŠΈν•œν…Œ μ•Œμ•„μ„œ λ‚΄μš©μ„ 쳐내달라고 ν–ˆμ—ˆλ‹€

Comments