การถอดเสียงพูดจากไฟล์ให้เป็นข้อความโดย GPT-3.5 Turbo AP

เร็วๆนี้ผู้เขียนได้มีโอกาสพูดคุยกับนักศึกษาชาวต่างชาติท่านหนึ่ง ซึ่งเขากำลังต้องการหาโปรแกรมที่สามารถถอดเสียงพูดจากไฟล์ให้เป็นข้อความ ในระหว่างการสนทนาก็มีความเห็นขึ้นมาว่า ChatGPT อาจจะสามารถช่วยได้ ผู้เขียนจึงได้หาข้อมูลและพบว่า ChatGPT มี API ที่มีความสามารถนี้อยู่ (API ย่อมาจาก Application Programming Interface เป็นวิธีการที่ให้นักพัฒนาใช้ความสามารถของแอปพลิเคชันของเว็บไซต์ผ่านการส่งสคริปท์โปรแกรม) ผู้เขียนเห็นว่าการถอดเสียงพูดจากไฟล์ให้เป็นข้อความ เป็นปัญหาพื้นฐานทั่วไปของนักวิจัยที่ต้องไปสัมภาษณ์กลุ่มวิจัย และคงจะมีโปรแกรมหรือเว็บไซต์ที่ให้บริการนี้อยู่แล้ว แต่ถ้าพูดถึงการใช้สมองของ AI ที่มีโมเดลภาษาขนาดใหญ่ (Large Language Mode) อยู่เบื้องหลังในการช่วยถอดเสียงพูด น่าจะเป็นเรื่องใหม่ที่น่าสนใจ จึงได้นำมาเล่าในบทความนี้
API ของ ChatGPT ที่มีความสามารถนี้มีชื่อว่า Whisper API ซึ่งบริษัท OpenAI เปิดตัวบนแพลตฟอร์มของ OpenAI เมื่อเดือน March 2023 การทำงานอาศัยโมเดลชื่อ Whisper ซึ่งก่อนหน้านี้ OpenAI ได้แชร์โมเดลแบบ Open Sourceผ่าน githubตั้งแต่เดือน September 2022 โมเดล Whisperผ่านการ Train ด้วยข้อมูลเสียงพูดหลากหลายภาษาบนอินเทอร์เน็ตกว่า 680,000 ชั่วโมง จากการวิจัยของ OpenAI พบว่า Whisper มีความถูกต้องในการแปลงเสียงเป็นข้อความมากกว่าโมเดลเฉพาะทางอื่นๆกว่า 50%
Whisper API ได้ถูกนำมาใช้บนแอปพลิเคชันเรียนการพูดภาษาอังกฤษ Speak ซึ่งเป็นแอปพลิเคชันเรียนภาษาอังกฤษที่เติบโตเร็วที่สุดในประเทศเกาหลี ผู้เรียนพูดคุยกับคู่สนทนาซึ่งเป็น AI ที่ปรับระดับการสนทนาให้เข้ากับผู้เรียนได้ นี่เป็นตัวอย่างของความสามารถของ AI ในการปรับการสอนให้เข้ากับระดับผู้เรียน (Personalized Learning) นอกจากนี้ Speakยังมีการใช้งานโมเดลรุ่นใหม่ GPT4 ในส่วนที่เป็น AI Tutorแอปพลิเคชัน Speakปัจจุบันมีผู้ใช้งานกว่า 5 ล้านคนและได้รับ rating สูงถึง 4.8

Whisper API รองรับการถอดเสียงพูดจากไฟล์นามสกุลต่างๆได้แก่ m4a, mp3, mp4, mpeg, mpga, wav, webm และคิดราคาเป็นนาทีของไฟล์เสียงอยู่ที่ $0.006 ต่อนาที

ขั้นตอนการใช้งาน Whisper API
ในการใช้งาน Whisper API เริ่มจากเข้าไปที่ https://platform.openai.com ในการเติมเครดิตการใช้งานด้วยการระบุบัตรเครดิตที่เมนู Settings → Billing → Payment Method

หลังจากนั้นก็สร้าง API Key จากเมนู API keys → Create new secret key

API key ที่ได้รับจะเป็นตัวอักษรผสมตัวเลขและเครื่องหมาย ซึ่งเราจะต้องใช้ API key ในการส่ง Request เข้าใช้งาน Whisper API ผ่านทางคำสั่ง curl (curl เป็นคำสั่งที่ใช้ในการติดต่อ server บนอินเทอร์เน็ตเพื่อใช้รับส่งข้อมูล) หรือผ่านโปรแกรมซึ่งเขียนด้วย python

1. การสร้าง Request ผ่านทางคำสั่ง curl
curl https://api.openai.com/v1/audio/transcriptions \
-H “Authorization: Bearer $OPENAI_API_KEY” \
-H “Content-Type: multipart/form-data” \
-F model=”whisper-1″ \
-F file=”@/path/to/file/openai.mp3″
ในตัวอย่างข้างบน เราต้องใส่ API key ตรงส่วนที่เป็น $OPENAI_API_KEY และใส่ชื่อไฟล์ตรงพารามิเตอร์ file ในการใส่ชื่อไฟล์นั้นต้องขึ้นต้นด้วยเครื่องหมาย @ เช่นถ้าไฟล์ชื่อ audio.mp4 ต้องใส่พารามิเตอร์ file เป็น file=“@audio.mp4”

Whisper API จะส่งผลลัพธ์มาในรูปของ JSON (JSON เป็นรูปแบบหนึ่งของโครงสร้างข้อมูลที่ใช้กันแพร่หลาย) บนหน้าจอ เช่น
{
“text”: “Imagine the wildest idea that you’ve ever had, and you’re curious about how it might scale to something that’s a 100, a 1,000 times bigger…”
}

ผลลัพธ์การถอดข้อความเสียงอยู่ในเครื่องหมายคำพูดของ key “text” ตัวอย่างข้างบนเริ่มจากคำว่า Imagine

หากถ้าเราต้องการ response ในรูปของไฟล์ อาจจะใช้เทคนิค redirection ด้วยเครื่องหมาย > หลังคำสั่ง curl เช่น สมมติถ้าต้องการผลลัพธ์อยู่ในไฟล์ transcribed_audio.txt เราแปลงคำสั่ง curl ด้านบนดังนี้
curl https://api.openai.com/v1/audio/transcriptions \
-H “Authorization: Bearer $OPENAI_API_KEY” \
-H “Content-Type: multipart/form-data” \
-F model=”whisper-1″ \
-F file=”@/path/to/file/openai.mp3″ > transcribed_audio.txt

2. การใช้โปรแกรมภาษา Python
โค้ดด้านล่างใส่ชื่อไฟล์เสียงที่ตัวแปร audio_file และรับผลลัพธ์ด้วยตัวแปร transcription

การใช้โค้ด python จะต้องมีการระบุ API key ไว้ก่อนในไฟล์แยกต่างหากที่ชื่อ .env ตามตัวอย่างด้านล่าง

จากประสบการณ์ผู้เขียนที่ได้ใช้ Whisper API จาก curl พบว่าสามารถถอดไฟล์บทสัมภาษณ์เสียงภาษาอังกฤษความยาวหนึ่งชั่วโมง ได้ภายในเวลา 2 นาที ตั้งแต่ 24 April 2024 ชื่อเรียก ChatGPT API ได้ถูกเปลี่ยนให้เรียกตามโมเดลที่ใช้ ในบทความอ้างอิงเรียก API นี้ว่า GPT-3.5 Turbo API

เอกสารอ้างอิง
[1] Introducing APIs for GPT-3.5 Turbo and Whisper, Online Available: https://openai.com/index/introducing-chatgpt-and-whisper-apis/
[2] Introducing Whisper, Online Available: https://openai.com/index/whisper/
[3] Speak – The language learning app that get you speaking, Online Available: https://www.speak.com

สารบัญ

เนื้อหานี้มีประโยชน์กับท่านหรือไม่ โปรดให้คะแนน

(No Ratings Yet)

Loading…

Views : 531 views

Cookie	Duration	Description
apbct_cookies_test	session	CleanTalk sets this cookie to prevent spam on comments and forms and act as a complete anti-spam solution and firewall for the site.
apbct_page_hits	session	CleanTalk sets this cookie to prevent spam on comments and forms and act as a complete anti-spam solution and firewall for the site.
apbct_prev_referer	session	Functional cookie placed by CleanTalk Spam Protect to store referring IDs and prevent unauthorized spam from being sent from the website.
apbct_site_landing_ts	session	CleanTalk sets this cookie to prevent spam on comments and forms and act as a complete anti-spam solution and firewall for the site.
apbct_site_referer	3 days	This cookie is placed by CleanTalk Spam Protect to prevent spam and to store the referrer page address which led the user to the website.
apbct_timestamp	session	CleanTalk sets this cookie to prevent spam on comments and forms and act as a complete anti-spam solution and firewall for the site.
apbct_urls	3 days	This cookie is placed by CleanTalk Spam Protect to prevent spam and to store the addresses (urls) visited on the website.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
ct_checkjs	session	CleanTalk–Used to prevent spam on our comments and forms and acts as a complete anti-spam solution and firewall for this site.
ct_fkp_timestamp	session	CleanTalk sets this cookie to prevent spam on the site's comments/forms, and to act as a complete anti-spam solution and firewall for the site.
ct_pointer_data	session	CleanTalk sets this cookie to prevent spam on the site's comments/forms, and to act as a complete anti-spam solution and firewall for the site.
ct_ps_timestamp	session	CleanTalk sets this cookie to prevent spam on the site's comments/forms, and to act as a complete anti-spam solution and firewall for the site.
ct_sfw_pass_key	1 month	CleanTalk sets this cookie to prevent spam on comments and forms and act as a complete anti-spam solution and firewall for the site.
ct_timezone	session	CleanTalk–Used to prevent spam on our comments and forms and acts as a complete anti-spam solution and firewall for this site.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_gtag_UA_123945990_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.

Cookie	Duration	Description
NID	6 months	NID cookie, set by Google, is used for advertising purposes; to limit the number of times the user sees an ad, to mute unwanted ads, and to measure the effectiveness of ads.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.

จุลสารนวัตกรรม ฉบับที่ 74 – สาระน่ารู้ เรื่อง การถอดเสียงพูดจากไฟล์ให้เป็นข้อความโดย GPT-3.5 Turbo API

การถอดเสียงพูดจากไฟล์ให้เป็นข้อความโดย GPT-3.5 Turbo API

เนื้อหานี้มีประโยชน์กับท่านหรือไม่ โปรดให้คะแนน

การถอดเสียงพูดจากไฟล์ให้เป็นข้อความโดย GPT-3.5 Turbo API

เนื้อหานี้มีประโยชน์กับท่านหรือไม่ โปรดให้คะแนน

Share this: