๐˜š๐˜ญ๐˜ฐ๐˜ธ ๐˜ฃ๐˜ถ๐˜ต ๐˜ด๐˜ต๐˜ฆ๐˜ข๐˜ฅ๐˜บ

[LLM] ํ™”์ƒํšŒ์˜ ์ค‘ STT to TTS ์ˆ˜ํ–‰ํ•˜๋Š” ์‹œ์Šคํ…œ ์„ค๊ณ„ - 3. OpenVidu ๊ธฐ๋ฐ˜ ํ™”์ƒํšŒ์˜ ์‹œ์Šคํ…œ์—์„œ ์‹ค์‹œ๊ฐ„ STS(STT-๋ฒˆ์—ญ-TTS) ์„œ๋น„์Šค ๊ตฌํ˜„ ๋ณธ๋ฌธ

machine learning/LLM

[LLM] ํ™”์ƒํšŒ์˜ ์ค‘ STT to TTS ์ˆ˜ํ–‰ํ•˜๋Š” ์‹œ์Šคํ…œ ์„ค๊ณ„ - 3. OpenVidu ๊ธฐ๋ฐ˜ ํ™”์ƒํšŒ์˜ ์‹œ์Šคํ…œ์—์„œ ์‹ค์‹œ๊ฐ„ STS(STT-๋ฒˆ์—ญ-TTS) ์„œ๋น„์Šค ๊ตฌํ˜„

.23 2025. 4. 20. 18:12

๋“œ๋””์–ด ๋งˆ์ง€๋ง‰..

์ด์ „ ํฌ์ŠคํŒ…: [LLM] ํ™”์ƒํšŒ์˜ ์ค‘ STT to TTS ์ˆ˜ํ–‰ํ•˜๋Š” ์‹œ์Šคํ…œ ์„ค๊ณ„ - 2. ์‹ค์‹œ๊ฐ„ STT์™€ ๋ฒˆ์—ญ์ด ๊ฐ€๋Šฅํ•œ ์‹œ์Šคํ…œ ๊ตฌํ˜„(+ FastAPI ๋ชจ๋ธ์„œ๋น™)

 

ํ”„๋กœ์ ํŠธ ๋ชฉ์ 

์ฝ”๋“œ ์†Œ๊ฐœ์— ์•ž์„œ, ํ•ด๋‹น ๊ธฐ๋Šฅ์ด ํ•„์š”ํ–ˆ๋˜ ํ”„๋กœ์ ํŠธ์˜ ๋ชฉ์ ์— ๋Œ€ํ•ด ๋จผ์ € ์„ค๋ช…ํ•˜๊ณ ์ž ํ•œ๋‹ค. ํ˜„์ง€ ๊ณต์žฅ์˜ ๋ฌธ์ œํ•ด๊ฒฐ์„ ๋”์šฑ ์šฉ์ดํ•˜๊ฒŒ ๋งŒ๋“ค๊ณ , ๋ณธ์‚ฌ์™€ ํ˜„์ง€ ๊ณต์žฅ ๊ฐ„ ์˜์‚ฌ์†Œํ†ต ์ฐจ์ด๋ฅผ ์™„ํ™”์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š” ์‹œ์Šคํ…œ์„ ์„ค๊ณ„ํ•˜์˜€๋‹ค.

 

ํ˜„์ง€ ๊ณต์žฅ์—์„œ๋Š” ๋งŒ์•ฝ ๊ณต์žฅ ๋‚ด ์ด์Šˆ๊ฐ€ ๋ฐœ์ƒํ•˜๋ฉด

 

1๋‹จ๊ณ„: ์ด์ƒํ˜„์ƒ ๋ชจ๋‹ˆํ„ฐ๋ง ๋ฐ ๊ฐ์ง€

ํ˜„์žฅ์—์„œ๋Š” ์ž์ฒด์ ์œผ๋กœ ๋ฌธ์ œ ๊ฐ์ง€ ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•˜์—ฌ, ์ด์ƒํ˜„์ƒ ๋ฐœ์ƒ ์‹œ ์–ธ์ œ ๋ฐœ์ƒํ•œ ์–ด๋– ํ•œ ์ข…๋ฅ˜์˜ ๋ฌธ์ œ์ธ์ง€ ํ˜„์ง€ ๊ณต์žฅ์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋„๋ก ์•Œ๋ฆผ์„ ์ „์†กํ•œ๋‹ค.

 

2๋‹จ๊ณ„: ์ฑ—๋ด‡์„ ํ†ตํ•œ ๋ฌธ์ œ ํ•ด๊ฒฐ

๋ฐœ์ƒํ•œ ๋ฌธ์ œ์ƒํ™ฉ๊ณผ ์ดฌ์˜ ์ด๋ฏธ์ง€๋ฅผ ์ฑ—๋ด‡์— ์—…๋กœ๋“œํ•˜๋ฉด ์‹œ์Šคํ…œ์ด ๋ฌธ์ œ์ƒํ™ฉ์„ ๋ถ„์„ํ•˜๊ณ , RAG๋ฅผ ํ†ตํ•ด ์ด์ „์— ์ถ•์ ํ•ด๋‘” ์œ ์‚ฌ ํ•ด๊ฒฐ์‚ฌ๋ก€, ํŠธ๋Ÿฌ๋ธ”์ŠˆํŒ… ๋ชฉ๋ก, ๊ฐ์ข… ๊ธฐ์ˆ ์ง€์นจ ๊ฐ€์ด๋“œ ๋“ฑ์„ ์ฐธ๊ณ ํ•˜์—ฌ ๋ฌธ์ œ ํ•ด๊ฒฐ ๋ฐฉ์•ˆ์„ ์ œ๊ณตํ•œ๋‹ค.

 

3๋‹จ๊ณ„: ๋ณธ์‚ฌ์— ์•Œ๋ฆผ ์ „์†ก

์ฑ—๋ด‡์œผ๋กœ๊นŒ์ง€ ๋ฌธ์ œ ํ•ด๊ฒฐ์ด ์–ด๋ ต๋‹ค๋ฉด, ์ง€๊ธˆ๊นŒ์ง€์˜ ๋ฌธ์ œ ๋ฐœ์ƒ ์ƒํ™ฉ ๋ฐ ์ฑ—๋ด‡ ์ƒ๋‹ด ์ •๋ณด๋ฅผ ์ข…ํ•ฉํ•˜์—ฌ ์ž๋™์œผ๋กœ ๋ณธ์‚ฌ ๋ฐ ํ˜‘๋ ฅ์—…์ฒด์— ๋ฉ”์ผ์„ ์ „์†กํ•œ๋‹ค.

 

4๋‹จ๊ณ„: ๋ณธ์‚ฌ์™€ ํ™”์ƒํšŒ์˜

์‹ค์‹œ๊ฐ„ ํ†ต์—ญ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•˜๋Š” ํ™”์ƒํšŒ์˜๋ฅผ ํ†ตํ•ด ๋ณธ์‚ฌ์™€ ๋ฌธ์ œ ํ•ด๊ฒฐ์„ ์ง„ํ–‰ํ•œ๋‹ค.

 

5๋‹จ๊ณ„: ํšŒ์˜๋ก ์ž๋™์ƒ์„ฑ

์ „์‚ฌ ๊ธฐ๋Šฅ์„ ํ†ตํ•ด ์ž๋™์œผ๋กœ ๊ธฐ๋ก๋˜๋Š” ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์ข…ํ•ฉํ•˜์—ฌ ํšŒ์˜๋ก์„ ์š”์•ฝ ์ƒ์„ฑํ•œ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์ดํ›„์— ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋Š” ๋ฌธ์ œ์— ๋Œ€ํ•ด ๋Œ€์‘ํ•  ์ˆ˜ ์žˆ๋Š” ๋˜๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”๊ฐ€ํ•œ๋‹ค.

 

 

์ด 5๋‹จ๊ณ„ ํŒŒ์ดํ”„๋ผ์ธ์„ ํ†ตํ•ด ํ˜„์ง€ ๊ณต์žฅ์€ ์–ธ์–ด์˜ ์žฅ๋ฒฝ ์—†์ด ๋ณด๋‹ค ์‰ฝ๊ฒŒ ๋ฌธ์ œ ํ•ด๊ฒฐ ๊ฐ€์ด๋“œ๋ฅผ ์ œ๊ณต๋ฐ›๊ณ , ๋ณธ์‚ฌ๋Š” ๋ถˆ๊ฐ€ํ”ผํ•œ ํŒŒ๊ฒฌ ๊ทผ๋ฌด๋ฅผ ์ตœ์†Œํ™”ํ•˜๋ฉด์„œ ํ˜„์žฅ ๊ด€๋ฆฌ๋ฅผ ํŽธํ•˜๊ฒŒ ํ•  ์ˆ˜ ์žˆ๋Š” ํ™˜๊ฒฝ์„ ๋งŒ๋“ค์–ด์ฃผ๋Š” ๊ฒƒ์ด ๋ณธ ํ”„๋กœ์ ํŠธ์˜ ๋ชฉ์ ์ด์—ˆ๋‹ค.

 

๋‚˜๋Š” ์ด ์ค‘ 4๋‹จ๊ณ„์ธ ๋ณธ์‚ฌ์™€ ํ™”์ƒํšŒ์˜ ์ง„ํ–‰์˜ ํ•ต์‹ฌ ๊ธฐ๋Šฅ์„ ๋‹ด๋‹นํ•˜์—ฌ ๊ฐœ๋ฐœ์„ ์ง„ํ–‰ํ–ˆ๋‹ค.

 

์ „์ฒด ์ง„ํ–‰ ํ”Œ๋กœ์šฐ

๋ณธ ํ”„๋กœ์ ํŠธ์˜ ์ „์ฒด ๊ตฌ์กฐ์™€ ๊ธฐ์ˆ ์  ํ๋ฆ„์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

 

๋ถˆํŽŒ์ฃ„์†กํ•ฉ๋‹ˆ๋‹ค๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค

GitHub Action์„ ํ™œ์šฉํ•˜์—ฌ ์ฝ”๋“œ ๋ณ€๋™์‚ฌํ•ญ์ด ์žˆ์„ ๋•Œ ์ž๋™์œผ๋กœ EC2 ์„œ๋ฒ„์— ๋ฐ˜์˜๋˜๋„๋ก ํ–ˆ๊ณ , ํฌ๊ฒŒ 7๊ฐ€์ง€์˜ ์„œ๋ฒ„๋กœ ๊ตฌ์„ฑํ•˜์˜€๋‹ค.

 

๋ณธ์‚ฌ์™€ ํ˜„์ง€๊ณต์žฅ์˜ ์ฃผ์š” ๊ธฐ๋Šฅ(ํ™”์ƒํšŒ์˜ ๋‚ด ๋“ค์–ด๊ฐ€๋Š” ๋ชจ๋ธ๋“ค, ์ฑ—๋ด‡, ์ด์ƒํ˜„์ƒ ํƒ์ง€ ๋“ฑ๋“ฑ)๋“ค์€ ์ „๋ถ€ fastAPI์— ๊ตฌ์„ฑํ–ˆ๊ณ ,

vue๋Š” ํ”„๋ก ํŠธ

MariaDB๋Š” ํšŒ์˜๋ก ๋ฐ ํ™”์ƒํšŒ์˜์˜ ์ ‘์†์ •๋ณด๋ฅผ ์ €์žฅํ•ด๋‘๋Š” ์—ญํ• ์„ ํ•˜๊ณ ,

Spring ์—ญ์‹œ ํ™”์ƒํšŒ์˜ ์ •๋ณด๋ฅผ ์ €์žฅํ•˜๋Š” ์—ญํ• ์„ ํ•˜๋„๋ก ๊ตฌ์„ฑํ–ˆ๋‹ค.

 

๋ณธ ๊ธฐ๋Šฅ ํ๋ฆ„

๊ทธ ์ค‘ ๋‚ด๊ฐ€ ์„ค๊ณ„ํ•œ ํ๋ฆ„์€ ์œ„์™€ ๊ฐ™๋‹ค..

๊ตฌ์กฐ๊ฐ€ ์ฒ˜์Œ์— ํ”„๋กœ์ ํŠธ๋ฅผ ์‚ฌ์ „ ์„ค๊ณ„ํ•˜๊ณ  ๊ทธ์— ๋งž์ถฐ ๋ฏธ๋‹ˆํ”„๋กœ์ ํŠธ๋ฅผ ์ง„ํ–‰ํ–ˆ์„ ๋•Œ์™€ ๋น„๊ตํ•ด์„œ ๋งŽ์ด ๋ฐ”๋€Œ์—ˆ๋‹ค.

 

1. ์Œ์„ฑ ์ „๋‹ฌ ๋ฐฉ์‹: ์‹ค์‹œ๊ฐ„ → ๋…น์Œํ•œ ๋ถ€๋ถ„๋งŒ

OpenVidu ๊ตฌ์กฐ ์ƒ ๋ฌด๋ฃŒ ์ธํ”„๋ผ์—์„œ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋Š” ํ†ต์‹  ๋ถˆ์•ˆ์ •, ๋…น์Œ ์ค‘ TTS ์Œ์„ฑ์˜ ๊ฐœ์ž…๊ณผ ๊ฐ™์€ ์š”์ธ์œผ๋กœ ์ธํ•ด ํ˜„์‹ค์ ์œผ๋กœ ์ข‹์€ ํ’ˆ์งˆ์˜ STT ๊ฒฐ๊ณผ๋ฅผ ๊ฐ€์ ธ์˜ค๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋…น์Œํ•œ ๋ถ€๋ถ„๋งŒ ์ „๋‹ฌํ•ด์„œ ์ง„ํ–‰ํ•˜๋Š” ๊ฒƒ์ด ์˜ณ๋‹ค๊ณ  ์ƒ๊ฐํ–ˆ๋‹ค. ๊ทธ๋ž˜์„œ ๊ทธ๋ƒฅ ๋…น์Œ๋ฒ„ํŠผ์„ ๋ˆ„๋ฅด๊ณ  ์ •์ง€ํ•  ๋•Œ ๊นŒ์ง€ ๋…น์Œํ•œ ๋ฌธ์žฅ์„ ๋„˜๊ธฐ๋Š” ๋ฐฉ์‹์œผ๋กœ ๋ณ€๊ฒฝํ–ˆ๋‹ค.

ํŠนํžˆ ํ”„๋กœ์ ํŠธ๋ฅผ ๋‚ด์ฃผ์…จ๋˜ ๊ต์ˆ˜๋‹˜๊ป˜์„œ ์Œ์„ฑ→์Œ์„ฑ ํ†ต์‹ ์„ ๊ฐ•์กฐํ•˜์…จ๋‹ค. ํ™”์ƒํšŒ์˜ ํ˜„์žฅ์—์„œ ์ž๋™ํ™” STS๋ฅผ ์ ์šฉํ•˜๋ ค๋ฉด ์Œ์„ฑ์„ ์ž…๋ ฅํ•˜๋Š” ๋ถ€๋ถ„์ด๋‚˜ ์ถœ๋ ฅํ•˜๋Š” ๋ถ€๋ถ„ ๋‘˜ ์ค‘ ํ•˜๋‚˜๋Š” ๋ฐ˜๋“œ์‹œ ์ธ๊ฐ„์˜ ๊ฐœ์ž…์ด ํ•„์š”ํ•œ๋ฐ, ์™œ๋ƒํ•˜๋ฉด ์Šคํ”ผ์ปค ํ†ต์‹  ์‹œ ์ถœ๋ ฅ ์Œ์„ฑ์ด ๋ฐ˜๋“œ์‹œ ์ž…๋ ฅ ์Œ์„ฑ์— ์„ž์ผ๊ฒƒ์ด๋ผ ์ƒ๊ฐํ–ˆ๋‹ค. ๋”ฐ๋ผ์„œ ํŒ€์›๋“ค๊ณผ ๋…ผ์˜ํ•ด๋ณธ ๊ฒฐ๊ณผ ์Œ์„ฑ '์ถœ๋ ฅ'์ด ์ž๋™ํ™”๋˜์–ด์•ผ ํ•œ๋‹ค๋Š” ๊ฒƒ์ด ์ข€ ๋” ๋งž๋‹ค๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ์˜๊ฒฌ์ด ์ ๋ ธ๊ธฐ ๋•Œ๋ฌธ์—, ๋…น์Œ์— ์ธ๊ฐ„์˜ ๊ฐœ์ž…์ด ๋“ค์–ด๊ฐ€๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ์ˆ˜์ •ํ–ˆ๋‹ค.

 

2. ํ†ต์‹  ๋ฐฉ์‹: websocket → REST API

1๋ฒˆ์— ์˜ํ•ด ์ „๋‹ฌ๋ฐฉ์‹๋„ ์›น์†Œ์ผ“์„ ํ™œ์šฉํ•œ ๋ฌด์ค‘๋‹จ ํ†ต์‹  ๋ฐฉ์‹ ๋ณด๋‹จ ๋ฒ„ํŠผ์„ ๋ˆŒ๋Ÿฌ ํŠธ๋ฆฌ๊ฑฐ๊ฐ€ ๋ฐœ์ƒํ•ด์•ผ์ง€ ์Œ์„ฑ์„ ์ „๋‹ฌํ•˜๊ณ , ๊ทธ์— ๋Œ€ํ•œ ์‘๋‹ต์œผ๋กœ ์ „์‚ฌ์™€ ๋ฒˆ์—ญ ๊ฒฐ๊ณผ๋ฅผ ๊ฐ€์ ธ์˜ค๋Š” RESTful๋ฐฉ์‹์œผ๋กœ ๋ฐ”๊ฟจ๋‹ค.

 

3. ๋น„๋™๊ธฐ์  ๊ตฌ์„ฑ: ์Œ์„ฑ์„ ๋ฐ›์•„์˜ค๋Š” ๋ถ€๋ถ„์—๋งŒ

REST API๋กœ ํ†ต์‹  ๋ฐฉ์‹์„ ๋ฐ”๊พธ์—ˆ๊ธฐ ๋•Œ๋ฌธ์—, ์›น์†Œ์ผ“ ๊ตฌํ˜„์—์„œ์ฒ˜๋Ÿผ ์ „์‚ฌ/๋ฒˆ์—ญ ๊ฒฐ๊ณผ๋ฅผ ๊ฐœ๋ณ„์ ์œผ๋กœ ์ „๋‹ฌํ•  ํ•„์š”์„ฑ์ด ์—†์–ด์กŒ๋‹ค. ๋”ฐ๋ผ์„œ ํ•œ ๋ฒˆ์˜ ์š”์ฒญ์œผ๋กœ ์ „์‚ฌ๋˜๋Š” ํ…์ŠคํŠธ์™€ ๋ฒˆ์—ญ ์ •๋ณด ๋ฐ TTS ์Œ์„ฑ์ •๋ณด๋ฅผ ์•ˆ์ •์ ์œผ๋กœ ๋ฌถ์–ด ๋ฐ˜ํ™˜ํ•  ์ˆ˜ ์žˆ๋Š” ๊ตฌ์กฐ๋กœ ์ „ํ™˜ํ–ˆ๋‹ค.

๊ทธ๋Ÿฌ๋‚˜ ์—ฌ์ „ํžˆ ๋‹ค์–‘ํ•œ ์‚ฌ์šฉ์ž๋“ค์ด ๋™์‹œ์— ๋งˆ์ดํฌ๋ฅผ ๋ˆ„๋ฅด๊ณ  ์Œ์„ฑ์ž…๋ ฅ์„ ํ•  ์ˆ˜ ์žˆ๋Š” ํšŒ์˜ ์ƒํ™ฉ์„ ๊ณ ๋ คํ•ด ์Œ์„ฑ ์ž…๋ ฅ๋งŒํผ์€ ์Šค๋ ˆ๋“œ๋ฅผ ํ™œ์šฉํ•œ ๋น„๋™๊ธฐ ์ฒ˜๋ฆฌ ๋ฐฉ์‹์„ ์œ ์ง€ํ–ˆ๋‹ค.

 

ํŒŒ์ผ ๊ตฌ์กฐ

headquater_system/
โ”œโ”€โ”€ config.py           # ์ „์—ญ ์„ค์ • (API ํ‚ค, ์˜ค๋””์˜ค ์„ค์ •, ์ „์—ญ ํด๋ผ์ด์–ธํŠธ ๋“ฑ)
โ”œโ”€โ”€ main.py             # ํ”„๋กœ๊ทธ๋žจ์˜ ์ง„์ž…์ : ๊ฐ ๋ชจ๋“ˆ์„ ๋ถˆ๋Ÿฌ์™€ ์Šค๋ ˆ๋“œ ์‹คํ–‰
โ””โ”€โ”€ modules/
|   โ”œโ”€โ”€ __init__.py
|   โ”œโ”€โ”€ stt.py          # STT ์ฒ˜๋ฆฌ (Whisper API, VAD, ์–ธ์–ด ๊ฐ์ง€)
|   โ”œโ”€โ”€ translation.py  # ๋ฒˆ์—ญ ์ฒ˜๋ฆฌ (GPT-4o-mini๋ฅผ ์‚ฌ์šฉ)
|   โ”œโ”€โ”€ tts.py          # TTS ์ฒ˜๋ฆฌ (GPT-4o-mini-tts๋ฅผ ์‚ฌ์šฉ)
|   โ”œโ”€โ”€ users.py        # ์‚ฌ์šฉ์ž ์ •๋ณด ๊ฐ์ฒด, ์—…๋ฐ์ดํŠธ ํ•จ์ˆ˜ ์ •์˜
|   โ””โ”€โ”€ utils.py        # ๊ณตํ†ต ์œ ํ‹ธ๋ฆฌํ‹ฐ ํ•จ์ˆ˜ (์–ธ์–ด ๋ณด์ •, ๋กœ๊ทธ ํŒŒ์ผ๋ช… ์ƒ์„ฑ, ๋””์Šคํ”Œ๋ ˆ์ด ์—…๋ฐ์ดํŠธ ๋“ฑ)
โ””โ”€โ”€ routers/
    โ”œโ”€โ”€ __init__.py
    โ””โ”€โ”€ hq.py        # ๋ผ์šฐํ„ฐ ์ •๋ณด ์„ค์ •

 

๋Œ€๋ถ€๋ถ„์˜ ๊ตฌ์กฐ์™€ ํ•จ์ˆ˜๋Š” ์ด์ „๊ณผ ๋™์ผํ•˜๋‹ค. ๋‹ค๋งŒ, ์—ฌ๊ธฐ์„œ๋Š” ์‚ฌ์šฉ์ž ์ •๋ณด(์ด๋ฆ„), ์‚ฌ์šฉ ์–ธ์–ด, ๋ฒˆ์—ญ ์–ธ์–ด ๊ทธ๋ฆฌ๊ณ  ์‚ฌ์šฉ์ž ๋ณ„ ๋ฐœํ™” ์ •๋ณด๋ฅผ ๊ฐœ๋ณ„์ ์œผ๋กœ ๊ด€๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด users.py๋ฅผ ์ƒˆ๋กœ ์ •์˜ํ•ด์ฃผ์—ˆ๋‹ค.

 


์ฝ”๋“œ

Router ๋ณ„ ์•ก์…˜ ์ •๋ฆฌ

ํ”„๋ก ํŠธ๋กœ๋ถ€ํ„ฐ ์ „๋‹ฌ๋ฐ›๋Š” ๊ฐ์ฒด๋Š” STTPayload, ์ „๋‹ฌํ•ด์ฃผ๋Š” ๊ฐ์ฒด๋Š” CombineResult์˜ list ํ˜•์‹์œผ๋กœ ์ •์˜ํ–ˆ๋‹ค.

class STTPayload(BaseModel):
    type: str
    speakerInfo: dict
    audioData: str
    sampleRate: int
    timestamp: int = None

class CombinedResult(BaseModel):
    speaker: str
    transcription: str
    translation: str
    tts_voice: str

class CombinedResultsResponse(BaseModel):
    results: list[CombinedResult]

 

์ด๋ ‡๊ฒŒ ์ •์˜๋œ ๊ฐ์ฒด ํ˜•ํƒœ์˜ POST ์š”์ฒญ์ด ๋“ค์–ด์˜ค๋ฉด ์–ด๋–ค ๊ฐ์ฒด๋ฅผ ๋ฐ›๊ณ , ์–ด๋–ค ๊ฐ์ฒด๋ฅผ return ํ•ด์ค„์ง€ ๊นŒ์ง€์˜ ๋ชจ๋“  ์•ก์…˜์„ ์•„๋ž˜์™€ ๊ฐ™์ด ์ž‘์„ฑํ•ด์ฃผ์—ˆ๋‹ค. ํ•จ์ˆ˜๊ฐ€ ๊ธธ์–ด์„œ ์กฐ๊ธˆ์”ฉ ์ž˜๋ผ๋ณด์•˜๋‹ค........

# 1. STT ๊ด€๋ฆฌ ํ•จ์ˆ˜
@hq_router.post("/stt/audio", response_model=CombinedResultsResponse)
async def stt_audio_endpoint(payload: STTPayload):
    print("[DEBUG] POST ์š”์ฒญ")
    global date_log

    # payload์˜ type ํ™•์ธ
    if payload.type != "live_audio_chunk":
        raise HTTPException(status_code=400, detail="Invalid payload type")

 

์—ฌ๊ธฐ๊นŒ์ง€๊ฐ€ ๊ธฐ๋ณธ์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ›์•„์˜ค๋Š” ๋ถ€๋ถ„์ด๊ณ , ๋งŒ์•ฝ payload์— 'type' ์ •๋ณด๊ฐ€ ์—†๊ฑฐ๋‚˜ live_audio_chunk๊ฐ€ ์•„๋‹ˆ๋ผ๋ฉด ๋” ์ด์ƒ ์ง„ํ–‰ํ•˜์ง€ ๋ชปํ•˜๊ฒŒ 404 exception์„ ๋ฐœ์ƒ์‹œํ‚ค๊ฒŒ ํ•˜์˜€๋‹ค.

 

    # ์‚ฌ์šฉ์ž ์ •๋ณด ์ถ”์ถœ
    speaker_info = payload.speakerInfo
    speaker_name = speaker_info.get("name", "Unknown")
    source_lang = speaker_info.get("speakerLang", "ko")
    target_lang = speaker_info.get("targetLang", "en")
    session_id = speaker_info.get("sessionId", None)  # Extract sessionId
    
    print(f"{speaker_name}: src {source_lang}, tar {target_lang}, sessionId: {session_id}")
    # ํƒ€์ž„์Šคํƒฌํ”„ ์ฒ˜๋ฆฌ
    timestamp = payload.timestamp if payload.timestamp is not None else int(time.time() * 1000)
    if not date_log:
        dt = datetime.fromtimestamp(timestamp / 1000)
        date_log = dt.strftime("%Y%m%d_%H%M%S")
    
    # REST ๋ฐฉ์‹์ด๋ฏ€๋กœ websocket์€ None ์ฒ˜๋ฆฌ
    user = get_or_create_user(speaker_name, source_lang, target_lang, websocket=None)

    # ์‚ฌ์šฉ์ž๋ณ„๋กœ STT ์ฒ˜๋ฆฌ ์Šค๋ ˆ๋“œ ์‹คํ–‰ (์ตœ์ดˆ ์—ฐ๊ฒฐ ํ›„ ์ฒ˜์Œ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์‹ ํ•  ๋•Œ ์‹คํ–‰)
    if not user.processing_started:
        threading.Thread(
            target=stt_processing_thread,
            args=(user,),  # user ๋‚ด๋ถ€์˜ audio_queue ๋“ฑ ์‚ฌ์šฉ
            daemon=True
        ).start()

        user.processing_started = True
    
    # audioData ๋””์ฝ”๋”ฉ ๋ฐ PCM ๋ฐ์ดํ„ฐ ๋ณ€ํ™˜ (Int16 -> float32, ์ •๊ทœํ™”, ๋ชจ๋…ธ ์žฌ๋ฐฐ์—ด)
    try:
        raw_bytes = base64.b64decode(payload.audioData)
        print(f"raw_bytes: {len(raw_bytes)}")
        audio_np = np.frombuffer(raw_bytes, dtype=np.int16).astype(np.float32) / 32768.0
        audio_np = audio_np.reshape(-1, 1)
    except Exception as e:
        raise HTTPException(status_code=400, detail=f"Audio data decoding error: {e}")
    
    # ์‚ฌ์šฉ์ž ๊ฐ์ฒด์˜ ์Œ์„ฑ ํ์— ๋ฐ์ดํ„ฐ๋ฅผ ๋„ฃ์Œ
    user.audio_queue.put((audio_np, payload.sampleRate))

 

์ดํ›„ payload ๋‚ด ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์™€ ์‚ฌ์šฉ์ž๋ฅผ ๋“ฑ๋กํ•˜๊ฑฐ๋‚˜ ๋ฉ”ํƒ€์ •๋ณด(์‚ฌ์šฉ ์–ธ์–ด/๋ฒˆ์—ญํ•  ์–ธ์–ด)๋ฅผ ์—…๋ฐ์ดํŠธ ํ•ด์ฃผ๊ณ , ์‚ฌ์šฉ์ž ๋ณ„ STS ํ”„๋กœ์„ธ์Šค ์Šค๋ ˆ๋“œ๋ฅผ ๋ฐœ์ƒ์‹œํ‚จ ํ›„, ์‚ฌ์šฉ์ž ๋ณ„ audio_queue์— ์Œ์„ฑ ์ •๋ณด๋ฅผ ๋„ฃ์–ด์ค€๋‹ค.

 

    # ๋ฐฑ๊ทธ๋ผ์šด๋“œ STT ๋ฐ ๋ฒˆ์—ญ ์ฒ˜๋ฆฌ ์Šค๋ ˆ๋“œ๊ฐ€ ์‹คํ–‰ ์ค‘์ด๋ผ๊ณ  ๊ฐ€์ •ํ•˜๊ณ ,
    # ๊ฒฐ๊ณผ๊ฐ€ ์ค€๋น„๋˜์–ด ์žˆ๋‹ค๋ฉด transcription_queue์™€ translated_queue์—์„œ ๊บผ๋‚ด ๊ฒฐํ•ฉ ๋ฉ”์‹œ์ง€๋กœ ์ƒ์„ฑ
    combined_results = []
    
    # ์ตœ๋Œ€ timeout์ดˆ๋™์•ˆ ๊ฒฐ๊ณผ๊ฐ€ ์ƒ์„ฑ๋  ๋•Œ๊นŒ์ง€ ๊ธฐ๋‹ค๋ฆฌ๋Š” ์˜ˆ์‹œ (polling)
    timeout = 120.0  # ์ตœ๋Œ€ ๋Œ€๊ธฐ ์‹œ๊ฐ„
    poll_interval = 0.2  # 200ms ๊ฐ„๊ฒฉ์œผ๋กœ ํด๋ง
    waited = 0.0

    while timeout > waited:
        try:
            # (stt ๊ฒฐ๊ณผ, ๋ฒˆ์—ญ ๊ฒฐ๊ณผ, tts ์Œ์„ฑ(ogg -> base64๋กœ ์ธ์ฝ”๋”ฉ))
            transcription, translation, tts_voice = user.final_results_queue.get_nowait()
 
            print(f"์ „์‚ฌ๊ฒฐ๊ณผ: {transcription}\n๋ฒˆ์—ญ๊ฒฐ๊ณผ:{translation}")
            combined_results.append({
                "speaker": speaker_name,
                "transcription": transcription,
                "translation": translation,
                "tts_voice": tts_voice
            })
            user.final_results_queue.task_done()

            break  # ๊ฒฐ๊ณผ๋ฅผ ๋ฐ›์•˜์œผ๋ฏ€๋กœ ์ข…๋ฃŒ
        except queue.Empty:
            # ๊ฒฐ๊ณผ๊ฐ€ ์•„์ง ์—†๋‹ค๋ฉด ์ž ์‹œ ๋Œ€๊ธฐ
            await asyncio.sleep(poll_interval)
            waited += poll_interval

    return CombinedResultsResponse(results=combined_results)

 

๋งˆ์ง€๋ง‰์œผ๋กœ ๋ฌดํ•œ๋ฃจํ”„๊ฐ€ ๋ฐœ์ƒ๋˜๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด timeout(120์ดˆ)์„ ์„ค์ •ํ•˜๊ณ  ๊ทธ ์‹œ๊ฐ„ ์ด๋‚ด์— ์ „์‚ฌ/๋ฒˆ์—ญ์ด ์™„๋ฃŒ๋˜๋ฉด ์ตœ์ข… final_results_queue์—์„œ ์ „์ฒด ๊ฐ’(stt ๊ฒฐ๊ณผ, ๋ฒˆ์—ญ ๊ฒฐ๊ณผ, ์ธ์ฝ”๋”ฉ๋œ tts ์Œ์„ฑ)์„ ๊ฐ€์ ธ์™€ ํ”„๋ก ํŠธ๋กœ ๋‹ค์‹œ ์ „์†กํ•ด์ค€๋‹ค.

๋ฐ•๋ณธ์‚ฌ ๋‹˜์˜ ํ™”๋ฉด
๋‚˜์นด๋ฌด๋ผ ์ผ„์ฃ  ์ƒ์˜ ํ™”๋ฉด

์ „์ฒด ์‹คํ–‰ ํ™”๋ฉด์€ ์ด๋ ‡๊ฒŒ ๋œ๋‹ค๐Ÿ‘

User

# modules/user.py

import threading
import time
import queue

# ์‚ฌ์šฉ์ž ์ •๋ณด๋ฅผ ๋‹ด๋Š” ํด๋ž˜์Šค ์ •์˜
class User:
    def __init__(self, name: str, source_lang: str = "ko", target_lang: str = "en", session_id: str = None):
        self.name = name
        self.source_lang = source_lang  # ์‚ฌ์šฉ์ž๊ฐ€ ๋งํ•˜๋Š” ์–ธ์–ด
        self.target_lang = target_lang  # ๋ฒˆ์—ญํ•  ๋Œ€์ƒ ์–ธ์–ด
        self.last_update = time.time()  # ๋งˆ์ง€๋ง‰ ์—…๋ฐ์ดํŠธ ์‹œ๊ฐ„ ๋“ฑ ์ถ”๊ฐ€ ์ •๋ณด ๊ธฐ๋ก ๊ฐ€๋Šฅ
        self.session_id = session_id    # ์„ธ์…˜ ID ์ถ”๊ฐ€
        
        self.detected_language = source_lang  # ๊ฐœ๋ณ„ ๊ฐ์ง€ ์–ธ์–ด
        self.processing_started = False  # ์ฒ˜๋ฆฌ ์Šค๋ ˆ๋“œ ์‹คํ–‰ ์—ฌ๋ถ€

        # ์‚ฌ์šฉ์ž ์ „์šฉ ํ๋“ค
        self.audio_queue = queue.Queue()       
        self.final_results_queue = queue.Queue()

        self.websocket = None

    def update(self, name: str = None, source_lang: str = None, target_lang: str = None, websocket=None, session_id: str = None):
        if name:
            self.name = name
        if source_lang:
            self.source_lang = source_lang
        if target_lang:
            self.target_lang = target_lang
        if session_id:
            self.session_id = session_id
        # ์ƒˆ WebSocket ๊ฐ์ฒด๊ฐ€ ์ œ๊ณต๋˜๋ฉด ์—…๋ฐ์ดํŠธ
        if websocket is not None:
            self.websocket = websocket
        self.last_update = time.time()

# ์ „์—ญ ์‚ฌ์šฉ์ž ์ €์žฅ์†Œ (๋™์‹œ ์ ‘๊ทผ์„ ์œ„ํ•ด lock ์‚ฌ์šฉ)
users_lock = threading.Lock()
users = {}

def get_or_create_user(name: str, default_source: str = "ko", default_target: str = "en", websocket=None, session_id: str = None) -> User:
    with users_lock:
        if name in users:
            user = users[name]
            # ์‚ฌ์šฉ์ž ์ •๋ณด ์—…๋ฐ์ดํŠธ (WebSocket๋„ ํ•จ๊ป˜ ์—…๋ฐ์ดํŠธ)
            user.update(name=name, source_lang=default_source, target_lang=default_target, websocket=websocket, session_id=session_id)
            return user
        else:
            user = User(name, default_source, default_target, session_id)
            user.websocket = websocket
            users[name] = user
            print(f"[DEBUG] ์ƒˆ๋กœ์šด ์‚ฌ์šฉ์ž๊ฐ€ ์ถ”๊ฐ€๋˜์—ˆ์Šต๋‹ˆ๋‹ค: {user.name}, ์„ธ์…˜ ID: {session_id}, ์ฐธ์—ฌ์ธ์›: {len(users)}")
            return user

def get_user_by_connection(connection_id: str) -> User:
    with users_lock:
        return users.get(connection_id)

 

๋งˆ์ง€๋ง‰๊นŒ์ง€ ์‚ฌ์šฉ์ž ๊ด€๋ จ ํ•จ์ˆ˜๋ฅผ ๊ณ„์† ๊ฑด๋“œ๋ ค์„œ ์ฝ”๋“œ ์ •๋ฆฌ๊ฐ€ ์ž˜ ์•ˆ๋œ ์ƒํƒœ์ง€๋งŒ..๐Ÿ˜…

์ด์ „์— on_event("startup")๊ณผ ๊ธฐํƒ€ ์ „์—ญ ๋ณ€์ˆ˜๋กœ ์„ ์–ธํ•ด์ฃผ์—ˆ๋˜ ๋ถ€๋ถ„์„ ์‚ฌ์šฉ์ž ๋ณ„๋กœ ๊ตฌ์„ฑํ•˜๊ธฐ ์œ„ํ•ด class User์— ์ „๋ถ€ ์ •์˜ํ•ด๋‘” ์ฝ”๋“œ์ด๋‹ค.

 

์˜์–ด / ์ผ๋ณธ์–ด / ์ค‘๊ตญ์–ด / ํŠ€๋ฅดํ‚ค์˜ˆ์–ด / ํ•œ๊ตญ์–ด

ํ…Œ์ŠคํŠธ๋ฅผ ์œ„ํ•ด ์‚ฌ์šฉ์ž ๊ณ„์ • 5๊ฐœ๋ฅผ ๋ฏธ๋ฆฌ ์ •์˜ํ•˜์˜€๊ณ , ์‚ฌ๋žŒ๋งˆ๋‹ค ์‚ฌ์šฉ ์–ธ์–ด๋ฅผ ๋‹ค๋ฅด๊ฒŒ ์ •์˜ํ•ด์ฃผ์—ˆ๋‹ค.

 

์šฐ์„  ์ฐธ์—ฌ ๊ณ„์ •์€ ๋‘๊ฐœ๋ผ๋Š” ์ „์ œ ํ•˜์— ํ”„๋ก ํŠธ ๋ ˆ๋ฒจ์—์„œ ๋ฏธํŒ… ์ฐธ์—ฌ์ž๋“ค์„ ๊ฐ์ง€ํ•˜๊ณ , ๋‚ด๊ฐ€ ์•„๋‹Œ ์ƒ๋Œ€ ์ฐธ์—ฌ์ž์˜ ์‚ฌ์šฉ ์–ธ์–ด๋ฅผ ์ฐธ์กฐํ•˜์—ฌ ๋ฒˆ์—ญํ•  ์–ธ์–ด๋ฅผ payload์— ๊ฐ™์ด ๋ณด๋‚ด์ฃผ๊ธฐ ๋•Œ๋ฌธ์—,

get_or_create_user ํ•จ์ˆ˜์—์„œ ์ด๋ฏธ ์กด์žฌํ•˜๋Š” ์‚ฌ์šฉ์ž๋ผ๋ฉด ๋ฒˆ์—ญ ์–ธ์–ด(target language)๋ฅผ updateํ•  ์ˆ˜ ์žˆ๊ฒŒ ์ฝ”๋“œ๋ฅผ ๊ตฌ์„ฑํ•ด์ฃผ์—ˆ๋‹ค.

ํ• ์ˆ˜์žˆ๋Š” ์ค‘๊ตญ์–ด๊ฐ€ ์ด๊ฑฐ๋ฐ–์— ์—†๋‹ค.

๊ทธ๋ž˜์„œ !!๋ณ„๋„์˜ ์–ธ์–ด ์„ ํƒ ๊ณผ์ • ์—†์ด!!(๋‚˜๋ฆ„ ์ž๋ž‘๐Ÿ‘) ์ด๋ ‡๊ฒŒ ์ฐธ์—ฌํ•˜๋Š” ์ฐธ๊ฐ€์ž ๋ณ„๋กœ ์ƒ๋Œ€๋ฅผ ์ธ์‹ํ•˜์—ฌ ๋ฒˆ์—ญ๋˜๋Š” ์–ธ์–ด๊ฐ€ ๋‹ฌ๋ผ์ง€๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

 

์‚ฌ์‹ค ์ „์‚ฌ์˜ ๊ฒฝ์šฐ whisper ์ž์ฒด๊ฐ€ ์–ด๋А์ •๋„ ์–ธ์–ด๋ฅผ ์ž๋™์œผ๋กœ ๊ฐ์ง€ํ•˜๋Š” ๊ธฐ๋Šฅ์ด ์กด์žฌํ•˜๊ธฐ๋„ ํ•˜๊ณ , ์ด์ „์— ์‚ฌ์šฉํ–ˆ๋˜ ์–ธ์–ด ์ž๋™ ๊ฐ์ง€ ์ฝ”๋“œ๋ฅผ ํ™œ์šฉํ•˜๋ฉด ์‚ฌ์šฉ์ž์˜ ์‚ฌ์šฉ ์–ธ์–ด(source language)๋Š” ๊ตณ์ด ํ”„๋ก ํŠธ๋‹จ์—์„œ ์ „๋‹ฌํ•ด์ฃผ์ง€ ์•Š์•„๋„ ๋œ๋‹คใ…Ž

๋ฒˆ์—ญํ•  ์–ธ์–ด๋ฅผ ์„ค์ •ํ•˜๋Š”๊ฒŒ ๋” ์–ด๋ ค์šด ์ž‘์—…์ด๊ธฐ ๋•Œ๋ฌธ์—, ์ถ”ํ›„ 3๋ช… ์ด์ƒ์ด ํšŒ์˜์— ์ฐธ์—ฌํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•˜๊ณ , ํšŒ์˜ ์ฐธ์—ฌ์ž๋“ค์˜ ์–ธ์–ด๊ถŒ์ด 3๊ฐœ ์ด์ƒ์ผ ๊ฒฝ์šฐ๋ฅผ ๊ณ ๋ คํ•œ ์ˆ˜์ • ๋ฐฉ์•ˆ์„ ์ƒ๊ฐ์ค‘์ด๋‹ค.

 

STS pipeline

def stt_processing_thread(user):
    while True:
        try:
            # user.audio_queue์— (audio_np, sample_rate) ํ˜•ํƒœ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋„ฃ์—ˆ๋‹ค๊ณ  ๊ฐ€์ •
            data_tuple = user.audio_queue.get(timeout=1)
            try:
                if isinstance(data_tuple, tuple):
                    data, sample_rate = data_tuple
                else:
                    data = data_tuple
                    sample_rate = 16000 # ๊ธฐ๋ณธ๊ฐ’์€ 16000

                # Optional: ์Œ์„ฑ ์ฒดํฌ (is_speech) – ํŒŒ์ผ ์ „์ฒด์— ๋Œ€ํ•ด์„œ ์Œ์„ฑ์˜ ์œ ๋ฌด๋ฅผ ํŒ๋‹จ
                if len(data) < int(sample_rate * 0.5) or not is_speech(data):
                    print(f"[DEBUG] {user.name} - ์Œ์„ฑ ์—†์Œ ๋˜๋Š” ๋„ˆ๋ฌด ์งง์€ ๋ฐœํ™”")
                    continue
                    
                text = stt_processing(user, data, sample_rate)
                if not text:
                    print(f"[DEBUG] {user.name} - STT ๊ฒฐ๊ณผ ์—†์Œ")
                    continue
            
                print(f"[DEBUG] STT ๊ฒฐ๊ณผ: {text}")

                # “Please transcribe exactly what you hear.” ์€ ์—๋Ÿฌ ์œ ๋„ ๋ฉ”์‹œ์ง€์ด๋ฏ€๋กœ ์Šคํ‚ต
                if text.strip().lower().startswith("please transcribe exactly what you hear"):
                    print(f"[DEBUG] {user.name} - ์—๋Ÿฌ ํ”„๋กฌํ”„ํŠธ ๊ฐ์ง€, ์Šคํ‚ต")
                    continue

                try:
                    translation = translation_process(user, text)
                except Exception as te:
                    print(f"[DEBUG] {user.name} ๋ฒˆ์—ญ ํ˜ธ์ถœ ์ค‘ ์˜ค๋ฅ˜: {te}", file=sys.stderr)
                    translation = ""
                
                try:
                    tts_voice = tts_process(translation)
                except Exception as te:
                    print(f"[DEBUG] {user.name} tts ํ˜ธ์ถœ ์ค‘ ์˜ค๋ฅ˜: {te}", file=sys.stderr)
                    tts_voice = ""

                user.final_results_queue.put((text, translation, tts_voice))
            finally:
                user.audio_queue.task_done()
            
        except queue.Empty:
                continue
        except Exception as e:
            print(f"STT ์ฒ˜๋ฆฌ ์ค‘ ์˜ค๋ฅ˜ ๋ฐœ์ƒ: {e}", file=sys.stderr)

 

 

์ฝ”๋“œ ๊ตฌ์กฐ๊ฐ€ ๋‹ค์†Œ ๋‹จ์ˆœํ•ด์กŒ๋‹ค.

๊ธฐ์กด์— thread์ฒ˜๋Ÿผ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋˜ ๊ฐ ํ”„๋กœ์„ธ์Šค๋ฅผ ์ผ๋ฐ˜ ํ•จ์ˆ˜๋กœ ๋ฐ”๊ฟ”์ฃผ๊ณ ,

์Œ์„ฑ ๋ฐ์ดํ„ฐ๊ฐ€ ๋“ค์–ด์˜ค๋ฉด ๊ฐ ๋‹จ๊ณ„๋ณ„๋กœ ํ•จ์ˆ˜๋ฅผ ์‹คํ–‰ํ•˜๊ณ , ํ…์ŠคํŠธ ๊ฐ’์ด ๋ณ€ํ™˜๋˜๋ฉด ํ•˜๋‚˜๋กœ ๋ฌถ์–ด final_results_queue์— ๋ฐ์ดํ„ฐ๋ฅผ ๋„ฃ์–ด์ฃผ๋ฉด ๋.

 

STT

def stt_processing(user, data, sample_rate):
    try:
        # ์ž„์‹œ ํŒŒ์ผ์— ์ €์žฅํ•˜์—ฌ STT ์ฒ˜๋ฆฌ (Whisper API ํ˜ธ์ถœ)
        with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as f:
            sf.write(f.name, data, sample_rate, format='WAV', subtype='PCM_16')

            # Whisper API ํ˜ธ์ถœ
            with open(f.name, "rb") as audio_file:
                response = CLIENT.audio.transcriptions.create(
                    model="whisper-1",
                    file=audio_file,
                    language=user.source_lang,
                    prompt="We're now on meeting. Please transcribe exactly what you hear."
                )
                    
        text = response.text.strip()

        return text
        
    except Exception as e:
        print(f"STT ์ฒ˜๋ฆฌ ์ค‘ ์˜ค๋ฅ˜ ๋ฐœ์ƒ: {e}", file=sys.stderr)
        return ""
    finally:
        # ์˜ˆ์™ธ ๋ฐœ์ƒ ์—ฌ๋ถ€์™€ ์ƒ๊ด€์—†์ด ์ž„์‹œ ํŒŒ์ผ์ด ์žˆ๋‹ค๋ฉด ์‚ญ์ œํ•ฉ๋‹ˆ๋‹ค.
        if f.name and os.path.exists(f.name):
            try:
                os.unlink(f.name)
            except Exception as del_e:
                print(f"์ž„์‹œ ํŒŒ์ผ ์‚ญ์ œ ์˜ค๋ฅ˜: {del_e}", file=sys.stderr)

 

์ด์ „๋ณด๋‹ค ํ›จ์”ฌ ๊ฐ„๋‹จํ•˜๊ฒŒ ์‚ฌ์šฉ์ž ์ •๋ณด, ์ž…๋ ฅ ์Œ์„ฑ, ์Œ์„ฑ์˜ rate(!!๋งค์šฐ ์ค‘์š”!!)๋งŒ ์ „๋‹ฌํ•˜์—ฌ ์ „์‚ฌ๋œ ํ…์ŠคํŠธ๋ฅผ ๋ฐ˜ํ™˜ํ•˜๋Š” ์ฝ”๋“œ๋กœ ๋ฐ”๊พธ์–ด์ฃผ์—ˆ๋‹ค.

 

๊ธฐ์กด์—๋Š” openvidu์—์„œ ์Œ์„ฑ์„ ๋ฐ›์•„์˜ค๊ณ ์ž ํ•˜์˜€์œผ๋‚˜, ํ”„๋ฆฌํ‹ฐ์–ด ํ™˜๊ฒฝ์—์„œ๋Š” ์•„๋ฌด๋ž˜๋„ 8GB ์ด์ƒ์˜ ์ตœ์†Œ ์‚ฌ์–‘์„ ์š”๊ตฌํ•˜๋Š” OpenVidu์˜ ํŠน์„ฑ์ƒ ์ƒ๋Œ€๋ฐฉ์˜ ์Œ์„ฑ/์˜์ƒ ์ŠคํŠธ๋ฆฌ๋ฐ์ด ์ œ๋Œ€๋กœ ์ง„ํ–‰์ด ๋˜์ง€ ์•Š๋Š” ๋ฌธ์ œ๊ฐ€ ์กด์žฌํ–ˆ๋‹ค.

๋”ฐ๋ผ์„œ ์ข€ ๋” ์•ˆ์ •๋œ ํ™˜๊ฒฝ์—์„œ ๋ณด์žฅ๋œ ํ’ˆ์งˆ์˜ ์Œ์„ฑ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด ๋กœ์ปฌ ๋งˆ์ดํฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋…น์Œ์„ ์ง„ํ–‰ํ•˜๋„๋ก ๋˜์–ด์žˆ๋Š”๋ฐ,

๊ทธ๋ ‡๊ฒŒ ๋  ๊ฒฝ์šฐ ๋กœ์ปฌ ๊ธฐ๊ธฐ๋งˆ๋‹ค ๋งˆ์ดํฌ์˜ ์ •๋ณด๊ฐ€ ๋‹ค๋ฅผ ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์˜ค๋””์˜ค์˜ ์ƒ˜ํ”Œ๋ ˆ์ดํŠธ๋ฅผ ์ „๋‹ฌํ•ด์ฃผ๋Š” ๊ฒƒ์ด ๋งค์šฐ ์ค‘์š”ํ•˜๋‹ค.. ์ด๊ฑธ ์‚ฌ์šฉํ•ด์„œ ์ธ์ฝ”๋”ฉ ๋””์ฝ”๋”ฉ์„ ์ง„ํ–‰ํ•˜๋Š”๋ฐ ์ด ๊ฐ’์ด ๋งž์ง€ ์•Š์œผ๋ฉด ์ œ๋Œ€๋กœ ๋œ ์˜ค๋””์˜ค๊ฐ€ ์ „๋‹ฌ๋˜์ง€ ์•Š๋Š”๋‹ค..๐Ÿฅฒ

 

Translation

def translation_process(user, text):
    try:
        # ๋งŒ์•ฝ ๋ฉ”์‹œ์ง€์˜ ์›๋ณธ ์–ธ์–ด์™€ ์ž์‹ ์˜ ๋Œ€์ƒ ์–ธ์–ด๊ฐ€ ๊ฐ™๋‹ค๋ฉด ๋ฒˆ์—ญํ•˜์ง€ ์•Š๊ณ  ๊ทธ๋Œ€๋กœ ์ „๋‹ฌ
        if user.source_lang == user.target_lang:
            print(f"[DEBUG] {user.name} ์†Œ์Šค ์–ธ์–ด์™€ ํƒ€๊ฒŸ ์–ธ์–ด๊ฐ€ ๋™์ผํ•˜์—ฌ ๋ฒˆ์—ญ ์—†์ด ์ „์†ก")
            return text
            
        # ๋ฒˆ์—ญ API ํ˜ธ์ถœ (์˜ˆ์‹œ: GPT-4o-mini ๋ฒˆ์—ญ ์š”์ฒญ)
        try:
            source_name = language_map.get(user.source_lang, "๊ฐ์ง€๋œ ์–ธ์–ด")
            target_name = language_map.get(user.target_lang)
            response = CLIENT.chat.completions.create(
                model="gpt-4o-mini",
                messages=[
                    {"role": "system", "content": f"""You are a professional interpreter. When translating from {source_name} to {target_name},
follow these rules:
1. ์• ๋งคํ•˜๊ฑฐ๋‚˜ ์˜คํ•ด ์†Œ์ง€๊ฐ€ ์žˆ์œผ๋ฉด → ๋ช…ํ™•ํ•œ ์šฉ์–ด๋กœ ๊ณ ์ณ ๋ฒˆ์—ญํ•œ๋‹ค.
2. ์ž˜๋ชป๋œ ๊ทผ๊ฑฐ ์ •๋ณด์ผ ๊ฒฝ์šฐ → ๋ถ€๋“œ๋Ÿฝ๊ฒŒ ์žฌํ™•์ธํ•˜๋„๋ก ๋ฒˆ์—ญํ•œ๋‹ค.
3. ํŽธํ˜‘·๊ณต๊ฒฉ์  ๋ฐœ์–ธ์ผ ๊ฒฝ์šฐ → ๊ฑด์„ค์ ์ธ ๋…ผ์˜ ๊ธฐํšŒ๋กœ ์ „ํ™˜๋˜๋„๋ก ํ†ค์„ ์กฐ์ ˆํ•œ๋‹ค.
4. ๋ฐ˜๋ณต·๋”œ๋ ˆ์ด ์ค‘์ธ ๋…ผ์˜์ผ ๊ฒฝ์šฐ → ํ•ต์‹ฌ์„ ์š”์•ฝํ•ด ์ฃผ์ œ์— ์ด๋Œ์–ด๋‚ธ๋‹ค.
5. ๊ฐ์ •์ด ๊ฒฉํ•ด์ง„ ๋ฐœ์–ธ์ผ ๊ฒฝ์šฐ → ์ค‘๋ฆฝ์  ์™„์ถฉ ์—ญํ• ์„ ํ•˜๋ฉฐ ๋ฒˆ์—ญํ•œ๋‹ค.
Translate exactly what they say, without any extra commentary."""},
                    {"role": "user", "content": text}
                ]
            )
            translation = response.choices[0].message.content.strip()
            print(f"[DEBUG] {user.name} ๋ฒˆ์—ญ ๊ฒฐ๊ณผ: {translation}")
        except Exception as e:
            print(f"[DEBUG] {user.name} ๋ฒˆ์—ญ ์˜ค๋ฅ˜: {e}", file=sys.stderr)
            translation = text
        return translation
        
    except Exception as e:
        print(f"[DEBUG] {user.name} ๋ฒˆ์—ญ ์ฒ˜๋ฆฌ ์ค‘ ์˜ค๋ฅ˜ ๋ฐœ์ƒ: {e}", file=sys.stderr)
        return text

 

๋ณธ ํ•จ์ˆ˜์˜ ํ”„๋กฌํ”„ํŠธ๊ฐ€ ์กฐ๊ธˆ ๋” ๊ธธ์–ด์กŒ๋‹ค.

 

์ด์ „์—๋Š” ๋‹จ์ˆœํžˆ ๋ฒˆ์—ญ๋งŒ ์ˆ˜ํ–‰ํ•˜๋ผ๊ณ  ์š”์ฒญํ•˜์˜€์œผ๋‚˜ ํ•ด๋‹น ํ”„๋กœ๊ทธ๋žจ์„ ์‹ค์ œ ์—…๋ฌด์—์„œ ์‚ฌ์šฉํ•œ๋‹ค๊ณ  ๊ฐ€์ •ํ•˜๊ณ , ๊ทธ ์ƒํ™ฉ์—์„œ ๋น„์ฆˆ๋‹ˆ์Šค ํ†ต์—ญ๊ฐ€๊ฐ€ ์ง€์ผœ์•ผ ํ•˜๋Š” ๋น„์ฆˆ๋‹ˆ์Šค ๋งค๋„ˆ์™€ ๋ฃฐ์„ ๊ณ ๋ คํ•˜์—ฌ ํ”„๋กฌํ”„ํŠธ๋ฅผ ์œ„์™€ ๊ฐ™์ด ์ˆ˜์ •ํ•ด์ฃผ์—ˆ๋‹ค.

 

์ฐธ๊ณ ์ž๋ฃŒ: https://www.linkedin.com/posts/xissy_%EC%BF%A0%ED%8C%A1%EC%9D%84-%EB%96%A0%EB%82%98-%EB%8B%A4%EB%A5%B8-%ED%95%9C%EA%B5%AD-%EC%8A%A4%ED%83%80%ED%8A%B8%EC%97%85%EB%93%A4%EC%97%90%EC%84%9C-%EC%9D%BC%ED%95%98%EB%A9%B4%EC%84%9C-%EB%8A%98-%ED%92%88%EA%B3%A0-%EC%9E%88%EB%8D%98-%EA%B6%81%EA%B8%88%EC%A6%9D%EC%9D%B4-%ED%95%98%EB%82%98-activity-7278644686020296704-Bg7l?utm_source=share&utm_medium=member_desktop&rcm=ACoAAC6WNvQB0_t0vhm3zjf2As1MOEsY_MwkSws

 

์ฟ ํŒก์„ ๋– ๋‚˜ ๋‹ค๋ฅธ ํ•œ๊ตญ ์Šคํƒ€ํŠธ์—…๋“ค์—์„œ ์ผํ•˜๋ฉด์„œ ๋Š˜ ํ’ˆ๊ณ  ์žˆ๋˜ ๊ถ๊ธˆ์ฆ์ด ํ•˜๋‚˜ ์žˆ์—ˆ๋‹ค. "ํšŒ์‚ฌ๊ฐ€

์ฟ ํŒก์„ ๋– ๋‚˜ ๋‹ค๋ฅธ ํ•œ๊ตญ ์Šคํƒ€ํŠธ์—…๋“ค์—์„œ ์ผํ•˜๋ฉด์„œ ๋Š˜ ํ’ˆ๊ณ  ์žˆ๋˜ ๊ถ๊ธˆ์ฆ์ด ํ•˜๋‚˜ ์žˆ์—ˆ๋‹ค. "ํšŒ์‚ฌ๊ฐ€ ์ปค์ง€๋ฉด ์ƒํ•˜ ์œ„๊ณ„ ๊ฐ„์€ ๋ฌผ๋ก ์ด๊ณ  ๊ฐ ์กฐ์ง ๊ฐ„์˜ ํฌ๊ณ  ์ž‘์€ ์ดํ•ด๊ด€๊ณ„์™€ ์•Œ๋ ฅ๋‹คํˆผ์ด ๋งŒ์—ฐํ•˜๋ฉฐ ์„ฑ์žฅ

kr.linkedin.com

 

TTS

def tts_process(translation):
    print(f"[TTS] ํ•ฉ์„ฑํ•  ํ…์ŠคํŠธ: {translation}")
    try:
        with tempfile.NamedTemporaryFile(suffix=".mp3", delete=False) as temp_file:
            temp_audio_path = Path(temp_file.name)
            print(f"[DEBUG] TTS ์ž„์‹œ ํŒŒ์ผ ๊ฒฝ๋กœ: {temp_audio_path}")

       # TTS API ํ˜ธ์ถœ (model="tts-1") – CLIENT.audio.speech.with_streaming_response.create ์‚ฌ์šฉ
        with CLIENT.audio.speech.with_streaming_response.create(
            model="tts-1",      # TTS ์ฒ˜๋ฆฌ ๋ชจ๋ธ: tts-1
            voice="nova",       # ์„ ํƒ ์˜ต์…˜ (์›ํ•˜๋Š” ๋ชฉ์†Œ๋ฆฌ๋กœ ์„ค์ •)
            input=translation,
            response_format="opus"
            # instructions="Optional additional instructions"  # ํ•„์š” ์‹œ ์ถ”๊ฐ€ ์ง€์นจ
        ) as response:
            response.stream_to_file(temp_audio_path)
        
        # ์ƒ์„ฑ๋œ ์Œ์„ฑ ํŒŒ์ผ์„ binary ๋ชจ๋“œ๋กœ ์ฝ์€ ํ›„ base64๋กœ ์ธ์ฝ”๋”ฉ
        with open(str(temp_audio_path), "rb") as audio_file:
            audio_bytes = audio_file.read()
        base64_audio = base64.b64encode(audio_bytes).decode('utf-8')
        print(f"[DEBUG] TTS ์Œ์„ฑ์ด base64๋กœ ์ธ์ฝ”๋”ฉ๋จ (๊ธธ์ด: {len(base64_audio)} ๋ฌธ์ž)")
        
        # ์ž„์‹œ ํŒŒ์ผ ์‚ญ์ œ
        os.unlink(str(temp_audio_path))
        
        # tts_result_queue์— base64 ์ธ์ฝ”๋”ฉ ์Œ์„ฑ ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅ
        return base64_audio
    except Exception as e:
        print(f"TTS ์˜ค๋ฅ˜: {e}", file=sys.stderr)
        return ""

 

 

์ „์†ก ์Œ์„ฑ ํƒ€์ž…์„ ์ง€์ •ํ•  ์ˆ˜๊ฐ€ ์žˆ๋‹ค. ๊ทผ๋ฐ ์—ฌ๊ธฐ์„œ ์ค‘์š”ํ•œ ๊ฒƒ์€ mp3๋Š” base64 ์ธ์ฝ”๋”ฉ ์‹œ 5์ดˆ ๊ฐ€๋Ÿ‰์˜ ๋ฐœํ™”๊ฐ€ ๋ฌด๋ ค 70KB๋ฅผ ๋„˜๋Š”๋‹ค .. ๊ทธ๋ž˜์„œ ์‚ฌ์‹ค ๋ฐ์ดํ„ฐ๊ฐ€ ์ „์†ก๋  ๋•Œ ์ž๊พธ ๋กœ๋”ฉ์ด ๊ฑธ๋ฆฐ ํ›„ ๋ฉ”์‹œ์ง€ ์ž์ฒด๊ฐ€ ์”นํžˆ๋Š” ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ–ˆ๋‹ค ใ…œใ…œ

์ฐพ์•„๋ณด๋‹ˆ ๋„คํŠธ์›Œํฌ ์„ฑ๋Šฅ ๋ฌธ์ œ์ธ๊ฑฐ๊ฐ™์•„.. ๋” ์ ์€ ์šฉ๋Ÿ‰์˜ ์Œ์„ฑ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜๊ณ ์ž ์›น์†Œ์ผ“ ์ŠคํŠธ๋ฆฌ๋ฐ ๋ฐ ํ†ต์‹ ์šฉ์— ์ตœ์ ํ™”๋œ ์ €์ง€์—ฐ·๊ณ ์••์ถ• ์ฝ”๋ฑ์ธ opus ํฌ๋งท์„ ์‚ฌ์šฉํ•ด์ค€๋‹ค.

 

์ด๋ฅผ openAI API์—์„œ ์„ค์ •ํ•ด์ค„๋• response_format์— ์ง€์ •ํ•ด์ฃผ๋ฉด ๋œ๋‹ค. ๊ธฐ๋ณธ ์„ค์ •์ด mp3๋‹ˆ๊นŒ, ์ œ์™ธํ•˜๊ณ  ์ง€์ •ํ•ด์ฃผ๋ฉด ๋œ๋‹ค.

์˜ต์…˜์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.:

  • MP3: The default response format for general use cases.
  • Opus: For internet streaming and communication, low latency.
  • AAC: For digital audio compression, preferred by YouTube, Android, iOS.
  • FLAC: For lossless audio compression, favored by audio enthusiasts for archiving.
  • WAV: Uncompressed WAV audio, suitable for low-latency applications to avoid decoding overhead.
  • PCM: Similar to WAV but contains the raw samples in 24kHz (16-bit signed, low-endian), without the header.

๐Ÿ”— ์ฐธ๊ณ : https://platform.openai.com/docs/guides/text-to-speech

 

Opus(.ogg)๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด mp3 ๋Œ€๋น„ ํ† ํฐ์ด 90% ๊ฐ€๋Ÿ‰ ์••์ถ•๋˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค.

๊ทธ๋ฆฌ๊ณ  ์‹ค์ œ๋กœ ํ™”์ƒํšŒ์˜์—์„œ ๋“ค์„ ๋•Œ ํฌ๊ฒŒ ์Œ์งˆ ์ฐจ์ด๋„ ๋А๋ผ์ง€ ๋ชปํ•œ๋‹ค. ์•ผํ˜ธ๐Ÿ™Œ

 

๊ทธ๋Ÿฌ๋‚˜ ๋งŒ์•ฝ์— ํ™•์žฅ์„ฑ์„ ์ƒ๊ฐํ•œ๋‹ค๋ฉด ์ž„์‹œ ํŒŒ์ผ์„ ์—…๋กœ๋“œํ–ˆ๋‹ค๊ฐ€ ๋ฏธํŒ… ์ข…๋ฃŒ ํ›„ ์‚ญ์ œํ•˜๋Š” ๋ฐฉ์‹์„ ๊ณ ๋ คํ•ด๋„ ๊ดœ์ฐฎ์„ ๊ฒƒ ๊ฐ™๋‹ค.


๊ฒฐ๊ณผ

 

ํ•˜์šธ๋ง์ด ์‚ด์ง ์žˆ์”๋‹ˆ๋‹ค..

 

Comments