<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <id>https://pythonbook.cc/articles</id>
    <title>為你自己學 PYTHON Blog</title>
    <updated>2024-11-18T00:00:00.000Z</updated>
    <generator>https://github.com/jpmonette/feed</generator>
    <link rel="alternate" href="https://pythonbook.cc/articles"/>
    <subtitle>為你自己學 PYTHON Blog</subtitle>
    <icon>https://pythonbook.cc/img/favicon.ico</icon>
    <entry>
        <title type="html"><![CDATA[「為你自己學 Python」正式上架！]]></title>
        <id>https://pythonbook.cc/articles/python-book-in-print</id>
        <link href="https://pythonbook.cc/articles/python-book-in-print"/>
        <updated>2024-11-18T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[為了給讀者最準確的理解，作者花了一年多整理資料並深入研究 Python 原始碼，只為揭開 Python 中那些容易被忽略或誤解的細節。新手自學時常受到不正確觀念的影響，因此在書中力求呈現真正的運作原理。這本書不僅是一本程式自學書，更是引導您深入理解技術的指引幫助您掌握 Python 核心概念，為邁向專業奠定堅實基礎。]]></summary>
        <content type="html"><![CDATA[<p><img decoding="async" loading="lazy" alt="為你自己學 Python" src="https://pythonbook.cc/assets/images/banner-ed78ea08d99a597fbd34c1851dbdb5ba.webp" width="1800" height="410" class="img_VgU6"></p>
<p>「為你自己學 Python」一書已經正式出版，書本裡的內容全文可在<a href="https://pythonbook.cc/" target="_blank" rel="noopener noreferrer">網站</a>上免費閱讀，但如果您覺得拿在手上閱讀比較有感覺，現在在天瓏書局網站也能購買紙本書：</p>
<ul>
<li>紙本書：<a href="https://5xcamp.us/pythonbook" target="_blank" rel="noopener noreferrer">https://5xcamp.us/pythonbook</a> 天瓏書局獨家販售</li>
<li>電子書：<!-- -->
<ul>
<li>Leanpub：<a href="https://leanpub.com/learn-python-for-your-own-good" target="_blank" rel="noopener noreferrer">https://leanpub.com/learn-python-for-your-own-good</a></li>
<li>Kobo：<a href="https://www.kobo.com/tw/zh/ebook/python-258" target="_blank" rel="noopener noreferrer">https://www.kobo.com/tw/zh/ebook/python-258</a></li>
</ul>
</li>
</ul>
<p>雖然我曾經寫過五、六年的 Python 程式，也曾在社群教過一陣子的 Python 課，但要提筆寫書又是不同的故事了。這本書從開始寫書到定稿，大概花了我一年多的時間，除了把以前上課的教材做了一次大大的整理，同時順便更新教材的軟體版本（當年還是 Python 2.x 的時代）。不只這樣，有些我當年自認為的觀念或看法，隨著年紀也有不同的領悟，也趁這個機會校正我自己對 Python 的理解。</p>
<p>這本書以 Python 3.12 做為主要教學版本，內容涵蓋環境安裝及 Python 程式語法，包括各種常用資料型態、邏輯及流程判斷、迴圈、錯誤處理、函數、模組、物件導向程式設計、檔案處理等，並透過網站爬蟲程式抓取並分析資料。沒有太多華麗的技巧，只有最基礎的程式觀念，期望能夠讓讀者在學習 Python 的過程中，建立穩固且正確的基礎。</p>
<p>在撰寫過程中，我除了翻閱官方文件以及 PEP（Python Enhancement Proposal）之外，有些我搞不清楚原理而且文件裡沒特別提到的設計，我就直接去翻 Python 的 C 語言原始碼來驗證自己的想法。新手自學程式容易被不完全正確觀念的影響而不自知，因此在這本書中我力求呈現正確且精準的觀念。我希望這不僅是一本程式自學參考書，也能引導各位讀者掌握 Python 正確的觀念，試著建立自己的「單一真相來源(Single Source of Truth)」。</p>
<h2 class="anchor anchorWithStickyNavbar_XzRG" id="關於這本書">關於這本書<a href="https://pythonbook.cc/articles/python-book-in-print#%E9%97%9C%E6%96%BC%E9%80%99%E6%9C%AC%E6%9B%B8" class="hash-link" aria-label="關於這本書的直接連結" title="關於這本書的直接連結">​</a></h2>
<h3 class="anchor anchorWithStickyNavbar_XzRG" id="跟其它的書有什麼不同">跟其它的書有什麼不同？<a href="https://pythonbook.cc/articles/python-book-in-print#%E8%B7%9F%E5%85%B6%E5%AE%83%E7%9A%84%E6%9B%B8%E6%9C%89%E4%BB%80%E9%BA%BC%E4%B8%8D%E5%90%8C" class="hash-link" aria-label="跟其它的書有什麼不同？的直接連結" title="跟其它的書有什麼不同？的直接連結">​</a></h3>
<p>目前坊間 Python 書已經非常多了，通常以行銷的立場，當推出新課程或是新產品，總是得要說一些特色來吸引讀者。一般的手法都是先評論坊間的書都有怎樣怎樣的缺點，接著說自己這本書跟其他書有什麼不同，只要買我這本就會考試都考一百分。這種踩其它產品來墊高自己的高度、踩別人的頭往上爬不是我的做風。</p>
<p>說是這樣說，但我在寫這本書的期間有到書店翻閱其他的 Python 的書籍，有些書裡的內容的確寫的沒那麼正確。舉個例子，有些作者會拿著 Java 的物件導向的觀念來解釋 Python 的物件導向，像是明明 Python 的類別本身就沒有跟 Java 一樣的 <code>private</code> 設計，但書裡卻寫的煞有其事。我相信作者不會刻意誤導讀者，但對可能無法分辨是否正確的新手學習者來說並不是什麼好事。我不敢保證自己寫的是 100% 都正確，但我在撰寫這本書的過程有要求自己一定要查證每一個觀念，這也是這本書花這麼久的時間才完成的原因之一，如果有寫錯的地方，都歡迎大家不吝指教。</p>
<p>同時，程式設計現在已經變成國民義務教育課綱的一部份，我希望可以讓更多人可以用較低的成本取得學習資源，本著開源的精神，所以我把這本書全文的內容都放在「<a href="https://pythonbook.cc/" target="_blank" rel="noopener noreferrer">為你自己學 Python</a>」網站上，並以 <a href="https://5xcamp.us/cc-nc-sa" target="_blank" rel="noopener noreferrer">CC BY-NC-SA 4.0</a> 方式授權予公眾自由取用，不需要花錢購買也可以閱讀。若您是學校老師或教授並且打算使用這本書當教材，都歡迎直接取用；如果需要紙本書可以直接與我連繫，我可用最低的印刷成本提供給教學單位。</p>
<p>所以，如果硬要說我的書跟其它書有什麼不同，大概就是<strong>我寫這本書的主要目的是教學而不是賺錢</strong>（雖然錢錢還是很香），這應該算是我跟其它書比較不同的地方（吧）。</p>
<h3 class="anchor anchorWithStickyNavbar_XzRG" id="推薦序">推薦序<a href="https://pythonbook.cc/articles/python-book-in-print#%E6%8E%A8%E8%96%A6%E5%BA%8F" class="hash-link" aria-label="推薦序的直接連結" title="推薦序的直接連結">​</a></h3>
<p>有朋友在看這本書的時候問到，我在業界有認識那麼多厲害的大神朋友，為什麼不找他們寫推薦序？答案很簡單，因為我不喜歡麻煩別人而已。其次，我自己在讀書的時候通常都會直接跳過推薦序，因為我想藉由自己的閱讀來評斷這本書的內容，而不是被別人的推薦影響。如果我自己不看推薦序，我也不會期待別人一定要看我的推薦序。</p>
<p>再者，我對這本書的內容有信心，我希望讓這本書的內容來為自己背書，如果大家看過覺得內容很受用，也歡迎各位能分享出去，讓更多人能知道這本書，這對我來說就是最好的推薦序了。</p>
<h2 class="anchor anchorWithStickyNavbar_XzRG" id="關於出版">關於出版<a href="https://pythonbook.cc/articles/python-book-in-print#%E9%97%9C%E6%96%BC%E5%87%BA%E7%89%88" class="hash-link" aria-label="關於出版的直接連結" title="關於出版的直接連結">​</a></h2>
<p>我之前出版過幾本書，都是跟出版社配合，有專門的編輯幫忙處理出版流程，我只要專心寫書就好。關於電腦書我一直有些想要實現的想法，不過出版社通常都有現成的流程或框架可能沒那麼容易實現，所以這回我就選擇了自己跑一趟出版流程，從封面設計、書本排版、印刷廠送印、ISBN 申請、販售通路協調，都自己 &amp;&amp; 找外包協助。</p>
<p>為什麼要自己做出版？原因之一是我想挑選我自己有興趣的主題，或是邀請我喜歡的作者來寫書，並且親自看過每本我經手的書。會有這個想法是因為某次與日本的出版社合作的時候，我發現他們的編輯雖然不懂程式，還是親自跟著我書裡的指令或程式碼範例跟著敲打，確保我書裡的程式碼都能正常執行。</p>
<p>另外，原本我提供給編輯的日文原稿是這樣：</p>
<blockquote>
<p>「俺の財宝か？欲しけりゃくれてやるぜ…探してみろ　この世の全てをそこにおいてきた」</p>
<p>《ONEPIECE》ゴール・D・ロジャー</p>
</blockquote>
<p>結果在出版社的編輯手上變成了這樣：</p>
<figure class="w-full"><p><img decoding="async" loading="lazy" alt="為你自己學 Python" src="https://pythonbook.cc/assets/images/gitbook-jp-d6237be06e380bcce1e63bd98f5fc90e.webp" width="1317" height="184" class="img_VgU6"></p></figure>
<p>不知道各位能不能看出這其中的細節。除了內容之外，編輯還會跟我確認書裡用到的所有圖片是不是版權可以用於出版使用，種種細節都讓我非常印象深刻。</p>
<p>另外，我還曾主動投稿到國外的某些出版社表示想要出版我在台灣賣的還不錯的書，但最後都被拒絕，原因並不是內容問題，而是他們的遊戲規則並不是作者寫了什麼就照單全收，而是跟他們家的編輯一起討論寫作方向，然後寫出期望的作品。雖然感覺作者少了一些自主權，但至少不會讓內容隨著作者自己想寫什麼就寫什麼，一不小心就歪樓的情況發生，這樣的流程也讓我覺得很有趣。</p>
<p>所以我就想，如果有機會讓我來當編輯或出版電腦書，我也希望能做到這樣的程度。</p>
<p>不過畢竟我不是專業的出版社，雖然可能可以控制出版的品質，但成本控制上可能沒那麼有優勢。簡單的說，像我這種業餘的出版流程又想要追求品質的做法，賣完一刷差不多就是打平而已。在這個網路時代賣電腦書，不賠錢就要偷笑了。</p>
<h3 class="anchor anchorWithStickyNavbar_XzRG" id="封面設計">封面設計<a href="https://pythonbook.cc/articles/python-book-in-print#%E5%B0%81%E9%9D%A2%E8%A8%AD%E8%A8%88" class="hash-link" aria-label="封面設計的直接連結" title="封面設計的直接連結">​</a></h3>
<p>之所以會想自己出版，另一個原因就是想要有自己喜歡的封面。對我來說，封面是書的第一個賣點，雖然聽起來有點膚淺，但我的確會因為漂亮的封面而買書。所以幾年前我就找了一位很喜歡的插畫設計師 <a href="https://www.facebook.com/croter.taiwan" target="_blank" rel="noopener noreferrer">Croter</a> 幫忙設計封面，設計師本身不會寫程式，在討論的過程我提供了以下幾點想法：</p>
<ul>
<li>Python 是一款目前全世界最受歡迎的程式語言，現在國中已經是必修課。</li>
<li>Python 的中文翻譯是蟒蛇，但蟒蛇有點恐怖，小一點的蛇也是 ok 的。</li>
<li>這本書主要內容是講授這個程式語言。</li>
<li>最後，我很喜歡七龍珠，它基本上就是我的整個童年，七龍珠的作者離開讓我覺得很難過。雖然 Python 是蛇不是龍，但不知道有沒有機會在設計裡加入一點對七龍珠致敬的元素。</li>
</ul>
<p>設計師很厲害，一下子就抓到了我想要的風格，也就是現在這本書的封面。不得不說，這本書的封面我真的很喜歡的，從顏色搭配到角色設計，處處都是細節。我知道這年頭 AI 繪圖越來越厲害，但我還是喜歡這種有溫度的手繪設計，很榮幸也很開心能有機會請大師出馬，讓我的書有這麼漂亮的封面。</p>
<p>後續還有四、五本書的封面（React、JavaScript、Ruby on Rails 等書）也都是請他幫忙而且都畫好了，每本我都很喜歡，大家可以期待一下。</p>
<h3 class="anchor anchorWithStickyNavbar_XzRG" id="編輯排版">編輯排版<a href="https://pythonbook.cc/articles/python-book-in-print#%E7%B7%A8%E8%BC%AF%EF%BF%BD%E6%8E%92%E7%89%88" class="hash-link" aria-label="編輯排版的直接連結" title="編輯排版的直接連結">​</a></h3>
<p>事實上，原本我是連書本的排版都是要自己來的，但學了一下 InDesign 就發現這裡面有太多專業，如果我連這個都自己來的話，大概就不用出書了。所以最後還是委託專業的設計公司<a href="https://www.elephantshine.com/" target="_blank" rel="noopener noreferrer">象晴設計</a>來幫忙處理排版，他們的設計師對書本的排版相當專業，也很有耐心，在經歷了幾次的校稿後，最後才有了這本書，不然大概不知道會拖到什麼時候...</p>
<h2 class="anchor anchorWithStickyNavbar_XzRG" id="結語">結語<a href="https://pythonbook.cc/articles/python-book-in-print#%E7%B5%90%E8%AA%9E" class="hash-link" aria-label="結語的直接連結" title="結語的直接連結">​</a></h2>
<p>這是我第一本自己發行的書，也是我第一次嘗試自己處理出版流程，有點辛苦但也讓我對出版業有了更進一步的了解。如果您也想寫電腦書，也認同我的想法，歡迎<a href="https://5xcampus.com/contact" target="_blank" rel="noopener noreferrer">與我們聯繫</a>來聊聊細節。</p>
<p>我把今年大部份的時間差不多都用在準備這本書上了，希望大家會喜歡，也希望能夠幫到更多想要學習 Python 程式的人。</p>
<p>謝謝大家 :)</p>]]></content>
        <author>
            <name>高見龍</name>
            <uri>https://kaochenlong.com</uri>
        </author>
        <category label="Python" term="Python"/>
        <category label="Programming" term="Programming"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[不重要的布林值冷知識]]></title>
        <id>https://pythonbook.cc/articles/boolean-in-python</id>
        <link href="https://pythonbook.cc/articles/boolean-in-python"/>
        <updated>2024-10-19T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[布林值在 Python 中其實是整數型態，因為 bool 類別繼承自 int。True 和 False 對應數字 1 和 0，並且它們在程式中只存在一份共用物件。此外，布林值不能被繼承，也無法在 Python 3 中重新定義。早期 Python 版本（2.2 以前）沒有布林型態，開發者會以 0 和 1 代替，直到 PEP 285 正式引入布林值才成為標準內建型態。]]></summary>
        <content type="html"><![CDATA[<p>在 Python 裡的布林型態（bool）的 <code>True</code> 跟 <code>False</code>，有些網路上的教學會說它們可以被轉型成數字 1 跟 0。這樣的說法是沒問題，但可能比較少人知道在 Python 裡布林值其實就是一種數字。</p>
<h2 class="anchor anchorWithStickyNavbar_XzRG" id="布林值就是一種整數">布林值就是一種整數<a href="https://pythonbook.cc/articles/boolean-in-python#%E5%B8%83%E6%9E%97%E5%80%BC%E5%B0%B1%E6%98%AF%E4%B8%80%E7%A8%AE%E6%95%B4%E6%95%B8" class="hash-link" aria-label="布林值就是一種整數的直接連結" title="布林值就是一種整數的直接連結">​</a></h2>
<p>不信的話，可以試著這樣做：</p>
<div class="language-console codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-console codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token plain">&gt;&gt;&gt; bool.__base__</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">&lt;class 'int'&gt;</span><br></span></code></pre></div></div>
<p><code>__base__</code> 屬性可以取得類別的上層類別，所以可以看的出來 bool 的上層類別就是數字 int，它們之間是有繼承關係的，所以布林值就是一種（kind of）整數。</p>
<p>而且，在 Python 裡的 True 跟 False 都只存在一份，而且直接內建在 Python 直譯器裡，不管你怎麼建立新的布林物件，它們都會指向同一顆物件。例如：</p>
<div class="language-console codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-console codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token plain">&gt;&gt;&gt; a = bool(123)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">&gt;&gt;&gt; b = bool("xyz")</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">&gt;&gt;&gt; a is b</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">True</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">&gt;&gt;&gt; a is True</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">True</span><br></span></code></pre></div></div>
<p>可以看到它們就是同一顆物件。</p>
<p>另外，如果程式碼中使用了一看就知道的布林值判斷，例如：</p>
<div class="language-python codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-python codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">True</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"這是真的"</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<p>因為這一定會發生，所以 Python 編譯的過程會直接把 <code>if True</code> 的判斷拿掉，只留下底下的程式碼：</p>
<div class="language-plaintext codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-plaintext codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token plain">0           0 RESUME                   0</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">1           2 NOP</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">2           4 PUSH_NULL</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            6 LOAD_NAME                0 (print)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            8 LOAD_CONST               1 ('這是真的')</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">           10 CALL                     1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">           18 POP_TOP</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">           20 RETURN_CONST             2 (None)</span><br></span></code></pre></div></div>
<p>反之，如果是這樣寫：</p>
<div class="language-python codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-python codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">False</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"這根本不會發生"</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<p>因為這一定不會發生，所以這段程式碼直接會被 Python 無視，根本不會被編譯成 Bytecode：</p>
<div class="language-plaintext codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-plaintext codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token plain">0           0 RESUME                   0</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">1           2 RETURN_CONST             1 (None)</span><br></span></code></pre></div></div>
<h2 class="anchor anchorWithStickyNavbar_XzRG" id="布林類別不能被繼承">布林類別不能被繼承<a href="https://pythonbook.cc/articles/boolean-in-python#%E5%B8%83%E6%9E%97%E9%A1%9E%E5%88%A5%E4%B8%8D%E8%83%BD%E8%A2%AB%E7%B9%BC%E6%89%BF" class="hash-link" aria-label="布林類別不能被繼承的直接連結" title="布林類別不能被繼承的直接連結">​</a></h2>
<p>有些 Python 的內建型別是可以被繼承的，例如 <code>int</code>、<code>str</code>、<code>list</code> 或是 <code>dict</code> 等等，但是 <code>bool</code> 這個型別是不行的，如果試著繼承 bool 會得到錯誤訊息：</p>
<div class="language-console codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-console codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token plain">&gt;&gt;&gt; class Cat(bool):</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">...   pass</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">...</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">Traceback (most recent call last):</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  File "&lt;stdin&gt;", line 1, in &lt;module&gt;</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">TypeError: type 'bool' is not an acceptable base type</span><br></span></code></pre></div></div>
<h2 class="anchor anchorWithStickyNavbar_XzRG" id="其實本來只有-0-跟-1">其實本來只有 0 跟 1<a href="https://pythonbook.cc/articles/boolean-in-python#%E5%85%B6%E5%AF%A6%E6%9C%AC%E4%BE%86%E5%8F%AA%E6%9C%89-0-%E8%B7%9F-1" class="hash-link" aria-label="其實本來只有 0 跟 1的直接連結" title="其實本來只有 0 跟 1的直接連結">​</a></h2>
<p>最後再補個冷知識，在 Python 2.2 之前，Python 其實是沒有布林型態的，早期的開發者常會使用 0 和 1 來代替布林值判斷，或是自己定義 <code>True</code> 和 <code>False</code>，例如 <code>True = 1</code> 或是 <code>False = 0</code> 這樣的寫法。直到 <a href="https://peps.python.org/pep-0285/" target="_blank" rel="noopener noreferrer">PEP 285</a> 提案之後才正式引入布林值的設計，並將上層類別設定成 <code>int</code> 類別。不過，即使 PEP 285 之後，在 Pytohn 2 時代 <code>True</code> 跟 <code>False</code> 還是可以被重新定義，像這樣：</p>
<div class="language-console codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-console codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token plain"># 注意，這是 Python 2</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">&gt;&gt;&gt; True = False</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">&gt;&gt;&gt; print(True)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">False</span><br></span></code></pre></div></div>
<p>直到 Python 3 之後，<code>True</code> 跟 <code>False</code> 才變成保留字，無法被重新定義。</p>]]></content>
        <author>
            <name>高見龍</name>
            <uri>https://kaochenlong.com</uri>
        </author>
        <category label="Python" term="Python"/>
        <category label="Programming" term="Programming"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[Python 字串組裝的效能比較]]></title>
        <id>https://pythonbook.cc/articles/string-concatenation-performance-in-python</id>
        <link href="https://pythonbook.cc/articles/string-concatenation-performance-in-python"/>
        <updated>2024-10-17T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[介紹 Python 中不同字串拼接方式的效能差異，包括使用 +、+=、F 字串與 .join() 方法。透過 Bytecode 分析與實際效能測試，詳細解釋各種方式的內部運作及其優缺點，在不同字串數量下的效能表現也有些不同。]]></summary>
        <content type="html"><![CDATA[<p>在 Python 要組出一個 "Hello World" 字串有好幾種方法，有的看起來很簡單，但也可以寫的很囉嗦：</p>
<div class="language-python codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-python codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># 第一種</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">str1 </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Hello "</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">+</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"World"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 第二種</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">str2 </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Hello "</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">str2 </span><span class="token operator" style="color:#393A34">+=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"World"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 第三種</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">a </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Hello"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">b </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"World"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">str3 </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string-interpolation string" style="color:#e3116c">f"</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">a</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c"> </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">b</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 第四種</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">words </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"Hello"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"World"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">str4 </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">" "</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">join</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">words</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<p>大家猜猜看，在這四種寫法當中，哪一種寫法的效能可能會是最差的？為什麼加上「可能」，因為在不同的版本或硬體上也許會有不一樣的結果。</p>
<h2 class="anchor anchorWithStickyNavbar_XzRG" id="不同的串接方式">不同的串接方式<a href="https://pythonbook.cc/articles/string-concatenation-performance-in-python#%E4%B8%8D%E5%90%8C%E7%9A%84%E4%B8%B2%E6%8E%A5%E6%96%B9%E5%BC%8F" class="hash-link" aria-label="不同的串接方式的直接連結" title="不同的串接方式的直接連結">​</a></h2>
<p>為了避免主觀猜測或自我感覺良好，我先從這幾種不同的寫法的編譯出來的 Bytecode 比較它們之間的差異。</p>
<h3 class="anchor anchorWithStickyNavbar_XzRG" id="使用-">使用 <code>+</code><a href="https://pythonbook.cc/articles/string-concatenation-performance-in-python#%E4%BD%BF%E7%94%A8-" class="hash-link" aria-label="使用-的直接連結" title="使用-的直接連結">​</a></h3>
<p>第一種寫法：</p>
<div class="language-python codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-python codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token plain">str1 </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Hello "</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">+</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"World"</span><span class="token plain">`：</span><br></span></code></pre></div></div>
<p>編譯之後的 Bytecode 如下：</p>
<div class="language-plaintext codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-plaintext codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line theme-code-block-highlighted-line" style="color:#393A34"><span class="token plain">  1           2 LOAD_CONST               0 ('Hello World')</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">              4 STORE_NAME               0 (str1)</span><br></span></code></pre></div></div>
<p>雖然原來的程式碼是 <code>"Hello "</code> 與 <code>"World"</code> 這兩個字串相加，但 Python 判斷出這兩個字串都是常數，而且它們之間的 <code>+</code> 是單純的字串串接，所以在編譯階段就先計算並處理好，可以減低執行期的負擔，還能避免了額外的記憶體分配，這種最佳化手法又稱「常數折疊（Constant Folding）」。從 Bytecode 的結果可以看的出來這個寫法的效能，應該會跟直接指定 <code>str1 = "Hello World"</code> 的效能差不多，效能在這幾個裡面最好的。</p>
<h3 class="anchor anchorWithStickyNavbar_XzRG" id="使用--1">使用 <code>+=</code><a href="https://pythonbook.cc/articles/string-concatenation-performance-in-python#%E4%BD%BF%E7%94%A8--1" class="hash-link" aria-label="使用--1的直接連結" title="使用--1的直接連結">​</a></h3>
<p>第二種寫法：</p>
<div class="language-python codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-python codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token plain">str2 </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Hello "</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">str2 </span><span class="token operator" style="color:#393A34">+=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"World"</span><br></span></code></pre></div></div>
<p>編譯之後的 Bytecode 如下：</p>
<div class="language-plaintext codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-plaintext codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token plain">  1           2 LOAD_CONST               0 ('Hello ')</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">              4 STORE_NAME               0 (str2)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  2           6 LOAD_NAME                0 (str2)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">              8 LOAD_CONST               1 ('World')</span><br></span><span class="token-line theme-code-block-highlighted-line" style="color:#393A34"><span class="token plain">             10 BINARY_OP               13 (+=)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">             14 STORE_NAME               0 (str2)</span><br></span></code></pre></div></div>
<p>跟第一個寫法比起來，除了沒有常數折疊的最佳化之外，指令還變多了，最後還多了一個 <code>BINARY_OP</code> 指令，這個指令實作的函數 <code>PyUnicode_Concat()</code> 雖然不算差，但每次執行都會產生一個新的字串物件，所以這效能不會比第一種寫法好。</p>
<h3 class="anchor anchorWithStickyNavbar_XzRG" id="使用-f-字串">使用 F 字串<a href="https://pythonbook.cc/articles/string-concatenation-performance-in-python#%E4%BD%BF%E7%94%A8-f-%E5%AD%97%E4%B8%B2" class="hash-link" aria-label="使用 F 字串的直接連結" title="使用 F 字串的直接連結">​</a></h3>
<p>第三種寫法：</p>
<div class="language-python codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-python codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token plain">a </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Hello"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">b </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"World"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">str3 </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string-interpolation string" style="color:#e3116c">f"</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">a</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c"> </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">b</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">"</span><br></span></code></pre></div></div>
<p>編譯之後的 Bytecode 如下：</p>
<div class="language-plaintext codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-plaintext codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token plain">  1           2 LOAD_CONST               0 ('Hello')</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">              4 STORE_NAME               0 (a)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  2           6 LOAD_CONST               1 ('World')</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">              8 STORE_NAME               1 (b)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  3          10 LOAD_NAME                0 (a)</span><br></span><span class="token-line theme-code-block-highlighted-line" style="color:#393A34"><span class="token plain">             12 FORMAT_VALUE             0</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">             14 LOAD_CONST               2 (' ')</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">             16 LOAD_NAME                1 (b)</span><br></span><span class="token-line theme-code-block-highlighted-line" style="color:#393A34"><span class="token plain">             18 FORMAT_VALUE             0</span><br></span><span class="token-line theme-code-block-highlighted-line" style="color:#393A34"><span class="token plain">             20 BUILD_STRING             3</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">             22 STORE_NAME               2 (str3)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">             24 RETURN_CONST             3 (None)</span><br></span></code></pre></div></div>
<p>F 字串是 Python 的一種字串格式化方法，如果想要在字串裡面安插變數的話，可讀性應該會比一般的字串組裝好得多。可讀性雖然比較好，但這個寫法編譯出來的 Bytecode 指令更多了。這裡有兩個值得看的點，一個是執行了兩次的 <code>FORMAT_VALUE</code> 指令來做格式化之外，最後的 <code>BUILD_STRING</code> 是建立一個新的字串物件。字串的 <code>FORMAT_VALUE</code> 實作的原始碼是 <code>PyObject_Format()</code> 函數，它會呼叫字串物件身上的 <code>__format__()</code> 方法，這個方法也會建立一個新的字串物件，所以這個寫法的效能應該會比第二種寫法更差。</p>
<h3 class="anchor anchorWithStickyNavbar_XzRG" id="使用-join-方法">使用 <code>join()</code> 方法<a href="https://pythonbook.cc/articles/string-concatenation-performance-in-python#%E4%BD%BF%E7%94%A8-join-%E6%96%B9%E6%B3%95" class="hash-link" aria-label="使用-join-方法的直接連結" title="使用-join-方法的直接連結">​</a></h3>
<p>來看看第四種寫法：</p>
<div class="language-python codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-python codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token plain">words </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"Hello"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"World"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">str4 </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">" "</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">join</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">words</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<p>編譯之後的 Bytecode 如下：</p>
<div class="language-plaintext codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-plaintext codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token plain">  1           2 LOAD_CONST               0 ('Hello')</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">              4 LOAD_CONST               1 ('World')</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">              6 BUILD_LIST               2</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">              8 STORE_NAME               0 (words)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  2          10 LOAD_CONST               2 (' ')</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">             12 LOAD_ATTR                3 (NULL|self + join)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">             32 LOAD_NAME                0 (words)</span><br></span><span class="token-line theme-code-block-highlighted-line" style="color:#393A34"><span class="token plain">             34 CALL                     1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">             42 STORE_NAME               2 (str4)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">             44 RETURN_CONST             3 (None)</span><br></span></code></pre></div></div>
<p>這裡的重點在於呼叫 <code>join()</code> 方法的 <code>CALL</code> 指令，這個方法也會建立一個新的字串。雖然好像指令比較少，但不表示效能就是好的。這個 <code>CALL</code> 指令背後的實作原始碼是 <code>PyUnicode_Join()</code> 函數，這個函數裡有一個 <code>_PyUnicode_JoinArray()</code> 函數，它會先計算出所有要組裝的字串的總長度，然後一次建立所需要的記憶體空間來裝這個字串。看起來好像不錯，但我們這個範例只有兩個字串的相加，如果還得先計算總長度再建立字串物牛，根本佔不到便宜，所以效能可能會比第二種寫法再差一些，但會比第三種寫法好一點點。</p>
<h2 class="anchor anchorWithStickyNavbar_XzRG" id="效能測試">效能測試<a href="https://pythonbook.cc/articles/string-concatenation-performance-in-python#%E6%95%88%E8%83%BD%E6%B8%AC%E8%A9%A6" class="hash-link" aria-label="效能測試的直接連結" title="效能測試的直接連結">​</a></h2>
<p>我們就實際來看看這四種寫法的效能吧！我的測試環境：</p>
<ul>
<li>Apple M1 Pro（2021 年）</li>
<li>記憶體 64 GB</li>
<li>macOS 14.4.1</li>
<li>Python 版本 3.12.7</li>
</ul>
<p>這裡我用了內建的 <code>timeit</code> 模組，來看看這四種寫法的效能：</p>
<div class="language-python codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-python codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> timeit </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> timeit</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 前置作業</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">a </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Hello"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">b </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"World"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">words </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">a</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> b</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 測試函數</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">test_str1</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    str1 </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Hello "</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">+</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"World"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">test_str2</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    str2 </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Hello "</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    str2 </span><span class="token operator" style="color:#393A34">+=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"World"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">test_str3</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    str3 </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string-interpolation string" style="color:#e3116c">f"</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">a</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c"> </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">b</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">test_str4</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    str4 </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">" "</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">join</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">words</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">n </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">10_000_000</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># 測試 1 千萬次</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 實測！</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">t1 </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> timeit</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"test_str1()"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token builtin">globals</span><span class="token operator" style="color:#393A34">=</span><span class="token builtin">globals</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> number</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">n</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">t2 </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> timeit</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"test_str2()"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token builtin">globals</span><span class="token operator" style="color:#393A34">=</span><span class="token builtin">globals</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> number</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">n</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">t3 </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> timeit</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"test_str3()"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token builtin">globals</span><span class="token operator" style="color:#393A34">=</span><span class="token builtin">globals</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> number</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">n</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">t4 </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> timeit</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"test_str4()"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token builtin">globals</span><span class="token operator" style="color:#393A34">=</span><span class="token builtin">globals</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> number</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">n</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 顯示結果</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string-interpolation string" style="color:#e3116c">f"str1 (字串相加): </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">t1</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">:</span><span class="token string-interpolation interpolation format-spec">.6f</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c"> 秒"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string-interpolation string" style="color:#e3116c">f"str2 (+= 累加): </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">t2</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">:</span><span class="token string-interpolation interpolation format-spec">.6f</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c"> 秒"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string-interpolation string" style="color:#e3116c">f"str3 (f 字串): </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">t3</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">:</span><span class="token string-interpolation interpolation format-spec">.6f</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c"> 秒"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string-interpolation string" style="color:#e3116c">f"str4 (join 方法): </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">t4</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">:</span><span class="token string-interpolation interpolation format-spec">.6f</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c"> 秒"</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<p>因為我想把重點放在這幾種寫法的效能，為求公平，我把變數宣告以及陣列的建立放在測試函數的外部，然後讓每個函數執行 1 千萬次。測試結果如下：</p>
<div class="language-plaintext codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-plaintext codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token plain">str1 (字串相加): 0.247897 秒</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">str2 (+= 累加): 0.535874 秒</span><br></span><span class="token-line theme-code-block-highlighted-line" style="color:#393A34"><span class="token plain">str3 (f 字串): 0.674152 秒</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">str4 (join 方法): 0.620409 秒</span><br></span></code></pre></div></div>
<p>也許在不同的環境下，這些數字會有所不同，但在我的測試環境下，不意外的，效能最好的是第一種寫法（因為它幾乎等同於直接指定字串 <code>str1 = "Hello World"</code>），效能最差的是第三種 F 字串的寫法，而 <code>.join()</code> 方法險勝一點點。</p>
<h2 class="anchor anchorWithStickyNavbar_XzRG" id="量大的時候">量大的時候...<a href="https://pythonbook.cc/articles/string-concatenation-performance-in-python#%E9%87%8F%E5%A4%A7%E7%9A%84%E6%99%82%E5%80%99" class="hash-link" aria-label="量大的時候...的直接連結" title="量大的時候...的直接連結">​</a></h2>
<p>雖然在上面的評測中，除了第一種寫法之外，其它的寫法差異不大，但這是因為串接的字串數量不多，當字串數量變多時，這些差異就會變得明顯，特別是最後一個 <code>.join()</code> 方法。這回，我讓拼接的字串數量增加到 10 萬個，然後再來看看效能的差異，因為因為第一種寫法等於直接指定字串，所以這次的評測我就不算它一份了，我就只測試後面三種寫法：</p>
<div class="language-python codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-python codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> timeit </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> timeit</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 前置作業</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">words </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token string-interpolation string" style="color:#e3116c">f"hey</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">i</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">"</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> i </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> </span><span class="token builtin">range</span><span class="token punctuation" style="color:#393A34">(</span><span class="token number" style="color:#36acaa">100000</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 測試函數</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">test_plus_equal</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    result </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">""</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> word </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> words</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        result </span><span class="token operator" style="color:#393A34">+=</span><span class="token plain"> word</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> result</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">test_fstring</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    result </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">""</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> word </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> words</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        result </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string-interpolation string" style="color:#e3116c">f"</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">result</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">word</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> result</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">test_join</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">''</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">join</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">words</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 實測！</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">t2 </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> timeit</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"test_plus_equal()"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token builtin">globals</span><span class="token operator" style="color:#393A34">=</span><span class="token builtin">globals</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> number</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">10</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">t3 </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> timeit</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"test_fstring()"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token builtin">globals</span><span class="token operator" style="color:#393A34">=</span><span class="token builtin">globals</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> number</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">10</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">t4 </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> timeit</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"test_join()"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token builtin">globals</span><span class="token operator" style="color:#393A34">=</span><span class="token builtin">globals</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> number</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">10</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 顯示結果</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string-interpolation string" style="color:#e3116c">f"str2 (大量 += 累加): </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">t2</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">:</span><span class="token string-interpolation interpolation format-spec">.6f</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c"> 秒"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string-interpolation string" style="color:#e3116c">f"str3 (大量 f 字串): </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">t3</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">:</span><span class="token string-interpolation interpolation format-spec">.6f</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c"> 秒"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string-interpolation string" style="color:#e3116c">f"str4 (大量 join 方法): </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">t4</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">:</span><span class="token string-interpolation interpolation format-spec">.6f</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c"> 秒"</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<p>我讓每個測試函數執行 10 次，測試結果如下：</p>
<div class="language-plaintext codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-plaintext codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token plain">str2 (大量 += 累加): 0.041256 秒</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">str3 (大量 f 字串): 7.548484 秒</span><br></span><span class="token-line theme-code-block-highlighted-line" style="color:#393A34"><span class="token plain">str4 (大量 join 方法): 0.004343 秒</span><br></span></code></pre></div></div>
<p><code>.join()</code> 方法大勝，甚至原本跟第二種 <code>+=</code> 寫法差不多的效能，在大量字串拼接的情況下，效能整個被拉開了。</p>
<p>為什麼差這麼多？前面有提到 <code>.join()</code> 在開始進行拼接前，會先計算出所有字串的總長度，並一次性分配所需的記憶體空間，原始碼 <code>_PyUnicode_JoinArray()</code> 裡面有一行寫著 <code>res = PyUnicode_New(sz, maxchar)</code> 就是在做這件事，那個 <code>sz</code> 就是總長度。</p>
<p>所以，以後應該用 <code>.join()</code> 方法來拼接字串嗎？倒也未必，從結果來看，如果字串不多的時候，用 <code>.join()</code> 佔不到多少便宜，而且還得先把字串裝在串列裡，程式碼的可讀性並沒有比較好。除非是大量的字串串接，否則我還是會選擇用 <code>+=</code> 或 F 字串的寫法，寫法簡單，可讀性高，效能也不會差太多。</p>]]></content>
        <author>
            <name>高見龍</name>
            <uri>https://kaochenlong.com</uri>
        </author>
        <category label="Python" term="Python"/>
        <category label="Programming" term="Programming"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[讓你的文件活起來 - RAG 實作]]></title>
        <id>https://pythonbook.cc/articles/2024/9/11/rag-workshop</id>
        <link href="https://pythonbook.cc/articles/2024/9/11/rag-workshop"/>
        <updated>2024-09-11T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[本次工作坊將帶領參與者理解 RAG 技術，並使用 Python 程式語言及 LangChain 框架進行實作，最終將文件轉化為可以進行互動問題的知識庫。]]></summary>
        <content type="html"><![CDATA[<p>這是在 iThome 所舉辦的 <a href="https://hwdc.ithome.com.tw/2024" target="_blank" rel="noopener noreferrer">Hello World Dev Conference</a> 工作坊的內容。本次工作坊將帶著大家理解 RAG（Retrieval-Augmented Generation）技術，並使用 Python 程式語言及 LangChain 框架進行實作，最終將文件轉化為可以進行互動問題的知識庫。</p>
<p>現在沒用過 AI 服務的人大概都要被歸類到上個世代的化石了。不得不說，這些 AI 服務真的很厲害，但到目前一直有個比較大的困擾，就是它有時候會一本正經的講幹話，就算遇到它不會的問題，因為它講出來的內容太有自信，導致不知道它到底是真的假的。</p>
<p>舉個例子，如果你幫公司做了一個用來做知識管理（Knowledge Management, KM）的網站，公司內部一些相關的規定都可以在這裡查到。你在這個網站上掛了一個聊天機器人，當訪客問機器人你們家的產品多少價錢或是該怎麼退換貨的時候，這時候不知道就該說不知道，而不是硬擠答案出來。我們可以透過適當的「提示（Prompt）」來限制 AI 的回答，但還是希望 AI 不要隨便亂回答，但就是因為這樣，才有了 RAG 的出現。</p>
<h2 class="anchor anchorWithStickyNavbar_XzRG" id="重要請先看這裡">重要！請先看這裡！<a href="https://pythonbook.cc/articles/2024/9/11/rag-workshop#%E9%87%8D%E8%A6%81%E8%AB%8B%E5%85%88%E7%9C%8B%E9%80%99%E8%A3%A1" class="hash-link" aria-label="重要！請先看這裡！的直接連結" title="重要！請先看這裡！的直接連結">​</a></h2>
<p>因為活動現場網路頻寬可能不夠快，而且因為會用到的模型檔案有點大，如果大家報名這個工作坊而且想在現場跟著一起實作的話，建議可以找空檔完成以下步驟：</p>
<ol>
<li>下載 Ollama 應用程式</li>
</ol>
<p>請在<a href="https://ollama.com/download" target="_blank" rel="noopener noreferrer">這裡</a>選擇合適的版本下載並安裝。</p>
<ol start="2">
<li>下載模型</li>
</ol>
<p>在本次工作坊中，我將使用 <code>mistral</code> 模型做為範例，這個模型大概有 4GB，你也可以選擇其它<a href="https://ollama.com/library" target="_blank" rel="noopener noreferrer">模型</a>。先執行指令下載 <code>mistral</code> 模型：</p>
<div class="language-text codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-text codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token plain">$ ollama pull mistral</span><br></span></code></pre></div></div>
<p>另外還會用到另一個比較小的模型 <code>nomic-embed-text</code>，也可以順便先拉下來：</p>
<div class="language-text codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-text codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token plain">$ ollama pull nomic-embed-text</span><br></span></code></pre></div></div>
<p>或是直接執行也可以：</p>
<div class="language-text codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-text codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token plain">$ ollama run mistral</span><br></span></code></pre></div></div>
<p>如果還沒下載過 <code>mistral</code> 模型的話，這個指令會幫你下載，並且直接進入聊天模式，然後就可以開始問它問題了：</p>
<div class="language-text codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-text codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token plain">$ ollama run mistral</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">&gt;&gt;&gt; 你好 :)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">嗨，nice to meet you! 今天好吗？</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">(Hello, nice to meet you! How are you today?)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">如果有需要帮助的话，我会尽力帮到您。</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">(If you need any help, I will do my best to assist.)</span><br></span></code></pre></div></div>
<p>如果有空檔的話，以下這兩個也可以順便安裝一下：</p>
<ul>
<li>Python：到 <a href="https://www.python.org/downloads/" target="_blank" rel="noopener noreferrer">Python 官網</a>下載合適版本，這個難度不高</li>
<li>文字編輯器：本次工作坊會使用 <a href="https://code.visualstudio.com/" target="_blank" rel="noopener noreferrer">VS Code</a>，但你可使用自己順手的開發工具即可。</li>
</ul>
<p>比較花時間的安裝到這裡差不多就算完成了。</p>
<h2 class="anchor anchorWithStickyNavbar_XzRG" id="名詞解釋">名詞解釋<a href="https://pythonbook.cc/articles/2024/9/11/rag-workshop#%E5%90%8D%E8%A9%9E%E8%A7%A3%E9%87%8B" class="hash-link" aria-label="名詞解釋的直接連結" title="名詞解釋的直接連結">​</a></h2>
<p>AI 時代一堆專有名詞或縮寫真的滿天飛，所以我們先從名詞解釋開始</p>
<h3 class="anchor anchorWithStickyNavbar_XzRG" id="llm">LLM<a href="https://pythonbook.cc/articles/2024/9/11/rag-workshop#llm" class="hash-link" aria-label="LLM的直接連結" title="LLM的直接連結">​</a></h3>
<p>LLM（Large Language Model），中文翻譯成「大型語言模型」，但這翻譯有跟沒有差不多。想想看，以前如果我們想要判斷使用者輸入了什麼內容，或是判斷語意，特別是中文，我們可能需要使用一些斷字、斷詞的資料庫才有辦法處理，特別是現在很多人「再在」「應因」不分的情況下，想要要判斷文字的意圖或情緒基本上是不太容易做到的。</p>
<p>雖然我們大部份的人，包括我自己也是，都不知道這到底是怎麼做到的，但 LLM 做到了。透過 LLM，可以幫我們解讀使用者的「輸入」，理解想要的問題，並且在適當的「提示（Prompt）」下回答使用者的問題，甚至有一些錯別字也不會影響結果。簡單來說，LLM 幫我們搞定輸入以及輸出的事情。</p>
<p>問題是，這個 LLM 的 M 是什麼 Model？又是怎麼訓練來的？像我們這種凡人應該是沒能力訓練模型的，大部份只能看那些神仙公司打架，然後撿他們打架掉出來的東西來用，像是 Facebook 的 Llama3、Google 的 Gemma、Mistral AI 的 mistral...等開源模型。</p>
<p>只是，就算我們撿到了這些訓練好的模型，大部份的我們大概也不太容易再對它再進行訓練，讓它更聰明。雖然我們可以透過更精準的 Prompt 請 LLM 回答的準確一點，但 LLM 並沒有我們想像中的聰明，它其實是很健忘的。舉個例子，當你你在跟 ChatGPT 聊天的過程中，感覺好像都會記得你跟它聊了什麼，這是因為 ChatGPT 在每次的對話裡，都會把同一個討論串的「情境」或「上下文（Context）」做個摘要，然後在下一個對話的時候一併傳給 ChatGPT。然而這樣不斷累積情境或上下文的做法是有限的，這東西又稱「上下文視窗（Context Window）」，這個窗戶沒辦法一直無限開下去，特別如果遇到你想要查詢的情境是一本或數本 PDF 電子書的時候，這可能就沒辦法了。</p>
<p>其次，就算是這些大公司訓練好的 Model，通常也都是根據現有的資料訓練的，如果想要問它最新一期大樂透開幾號，或是問它今天台積電的收盤價多少，它應該不會知道。另外，因為這些模型都是使用公開的資料進行訓練的，所以如果你想問它你公司的請假規定，照理它應該也不會知道。</p>
<p>所以在這種情況下，LLM 會怎麼回答你？還能怎麼回答，就瞎掰啊！</p>
<h3 class="anchor anchorWithStickyNavbar_XzRG" id="rag">RAG<a href="https://pythonbook.cc/articles/2024/9/11/rag-workshop#rag" class="hash-link" aria-label="RAG的直接連結" title="RAG的直接連結">​</a></h3>
<p>RAG（Retrieval-Augmented Generation），中文翻譯成「檢索增強生成」，RAG 的重點不在檢索（Retrieval）或生成（Generation），因為這本來 LLM 就能做到了，重在在於「增強（Argumented）」，透過它，LLM 可以更精準的回答我們的問題。</p>
<p>講這到這裡，可能會跟另一個名詞「微調（Fine-tuning）」有點像，但其實這兩個是不太一樣的概念。微調是指針對現有的 Model 再拿特定領域的資料進行訓練、調整，例如可餵食特定領域醫學相關的資料，增進這個 Model 在這方面的醫學知識。</p>
<p>而 RAG 並不是對原本的 Model 進行調整，而是在我們對 LLM 下 Prompt 的時候提供額外的參數或資訊，讓 LLM 能更準確的回答問題。簡單的說，RAG 是提供給 LLM「額外的知識」，再講的更白話一點，就是讓你原本對 LLM 下的 Prompt 更精準。</p>
<p>很多時候你以為可以透過「微調」來讓 LLM 聰明一點，事實上你可能需要的是 RAG。</p>
<h3 class="anchor anchorWithStickyNavbar_XzRG" id="embedding">Embedding<a href="https://pythonbook.cc/articles/2024/9/11/rag-workshop#embedding" class="hash-link" aria-label="Embedding的直接連結" title="Embedding的直接連結">​</a></h3>
<p>Embedding，中文翻譯成「嵌入」，這個詞可能比較難從字樣上來想像是怎麼回事。簡單來說，Embedding 是一種將資料轉換成「向量（Vector）」的過程或技術，這麼做可以讓 AI 更好地「理解」並比對資料的「相似性」。</p>
<p>什麼是「向量」？舉例來說，我想針對香蕉、芭樂、蘋果跟榴槤這四個水果的味道跟質地軟硬做個比較，如果我把味道跟質地用一個二維空間來呈現，可能看起來會像這樣：</p>
<div class="language-text codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-text codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token plain">臭 ^</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">　 |                      榴槤(0.9, 0.9)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">　 |</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">　 |</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">　 |</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">　 |</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">　 |</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">　 |</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">　 |         蘋果(0.5, 0.2) 芭樂(0.6, 0.2)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">香 |     香蕉(0.3, 0.1)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   +----------------------------------&gt; x</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    軟                               硬</span><br></span></code></pre></div></div>
<p>透過幫這些水果「打分數」，就能幫它們在這個二維空間定出位置來。在二維空間越接近的水果，代表它們的味道跟質地越相似。</p>
<p>回到我們的主題，Embedding 技術可以讓我們把資料轉換成向量，這樣我們就可以透過向量的方式來比對資料的相似性。這裡指的向量不會像我們這裡只是二維的空間，而是可能有幾百維，甚至上千維。透過向量的方式來比對資料的相似性，可以除了能幫原本的 LLM 補充一些它原本不知道的事之夕，也讓它更容易找到精準的資料。</p>
<h3 class="anchor anchorWithStickyNavbar_XzRG" id="向量資料庫">向量資料庫<a href="https://pythonbook.cc/articles/2024/9/11/rag-workshop#%E5%90%91%E9%87%8F%E8%B3%87%E6%96%99%E5%BA%AB" class="hash-link" aria-label="向量資料庫的直接連結" title="向量資料庫的直接連結">​</a></h3>
<p>如果每次都要對資料轉成向量，效能可能會有點差，所以有一個可以存放這些向量的地方是很重要的。也許你會想把這些向量另外存在文字檔裡，但這樣的話，每次要找相似的資料時，就要把所有的向量都讀進來，然後再逐一比對，這可能也會有效能上的問題。</p>
<p>還好現在有專門用來存放向量的資料庫，以我們這個工作坊會用到的 <a href="https://www.trychroma.com/" target="_blank" rel="noopener noreferrer">Chroma</a> 為例，就是一個專門用來存放向量的資料庫，而且還有一些神奇的特異功能，讓我們不用懂太多演算法就能找出相似的向量。</p>
<h2 class="anchor anchorWithStickyNavbar_XzRG" id="動手作">動手作！<a href="https://pythonbook.cc/articles/2024/9/11/rag-workshop#%E5%8B%95%E6%89%8B%E4%BD%9C" class="hash-link" aria-label="動手作！的直接連結" title="動手作！的直接連結">​</a></h2>
<p>好啦，故事講的差不多了，是該時候動手做點東西了！</p>
<h3 class="anchor anchorWithStickyNavbar_XzRG" id="準備工作">準備工作<a href="https://pythonbook.cc/articles/2024/9/11/rag-workshop#%E6%BA%96%E5%82%99%E5%B7%A5%E4%BD%9C" class="hash-link" aria-label="準備工作的直接連結" title="準備工作的直接連結">​</a></h3>
<h4 class="anchor anchorWithStickyNavbar_XzRG" id="程式語言及開發工具">程式語言及開發工具<a href="https://pythonbook.cc/articles/2024/9/11/rag-workshop#%E7%A8%8B%E5%BC%8F%E8%AA%9E%E8%A8%80%E5%8F%8A%E9%96%8B%E7%99%BC%E5%B7%A5%E5%85%B7" class="hash-link" aria-label="程式語言及開發工具的直接連結" title="程式語言及開發�工具的直接連結">​</a></h4>
<p>雖然沒有限定程式語言，但以現在的大環境來看，Python 可能會是最好的選擇，並不是因為 Python 這個程式語言特別厲害，而是 Python 的生態系裡面有很多相關的套件可以使用，可以幫我們省下不少時間。</p>
<p>開發工具的話，因為我們要寫的程式碼也沒多複雜，所以用一般的文字編輯器就很夠了。如果你沒有慣用的文字編輯器，可以考慮使用 Visual Studio Code（簡稱 VS Code）。</p>
<h4 class="anchor anchorWithStickyNavbar_XzRG" id="環境安裝">環境安裝<a href="https://pythonbook.cc/articles/2024/9/11/rag-workshop#%E7%92%B0%E5%A2%83%E5%AE%89%E8%A3%9D" class="hash-link" aria-label="環境安裝的直接連結" title="環境安裝的直接連結">​</a></h4>
<p>可以從 Python 官網下載安裝包，或是透過 <code>pyenv</code> 來安裝也可以，細節可參閱<a href="https://pythonbook.cc/chapters/basic/install-python">環境安裝</a>章節的介紹。</p>
<h3 class="anchor anchorWithStickyNavbar_XzRG" id="手工打造">手工打造<a href="https://pythonbook.cc/articles/2024/9/11/rag-workshop#%E6%89%8B%E5%B7%A5%E6%89%93%E9%80%A0" class="hash-link" aria-label="手工打造的直接連結" title="手工打造的直接連結">​</a></h3>
<p>在開始之前，我想先問試著問 Ollama 一個問題，看看它會怎麼回答：</p>
<div class="language-text codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-text codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token plain">$ ollama run mistral</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">&gt;&gt;&gt; 請問高見龍是誰？</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">高見龍（高氏明龍，1592年-1645年）是中國明朝時期的一位知名詩人。他出生於江南，從小受教于家里，後入京立學。高見龍在文學界著名之作有《山水遺集</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">》、《詩集新選》等。他是明朝最後一位有名的北宋式詩人，對中國詩歌傳統產生了深刻影響。</span><br></span></code></pre></div></div>
<p>喔喔喔，這麼厲害啊！我怎麼都不知道原來我是詩人？沒關係，待會我會借用<a href="https://pythonbook.cc/about">這個頁面</a>的資料並且透過 RAG 來改善這個結果。</p>
<p>為了保持每個專案的套件不互相干擾，我先透過 Python 內建 <code>venv</code> 建立並切換至虛擬環境：</p>
<div class="language-text codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-text codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token plain">$ python -m venv .venv</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">$ source .venv/bin/activate</span><br></span></code></pre></div></div>
<p>對 Python 虛擬環境不熟悉的話，可參閱<a href="https://pythonbook.cc/chapters/basic/install-python">環境安裝</a>的「使用 venv」小節介紹。再來，安裝待會要用的套件 <code>ollama</code> 以及用來計算向量相似度的 <code>numpy</code>：</p>
<div class="language-text codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-text codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token plain">$ pip install ollama numpy</span><br></span></code></pre></div></div>
<p>接著，我把「關於我」的頁面上的文字存成一個文字檔 <code>about_me.txt</code> 放在同一個目錄裡，接著建立一個 <code>rag_v1.py</code> 檔案，然後來寫點程式碼：</p>
<div class="language-py codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockTitle_JhIl">檔案：rag_v1.py</div><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-py codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">parse_paragraph</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">filename</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">with</span><span class="token plain"> </span><span class="token builtin">open</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">filename</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">as</span><span class="token plain"> f</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">line</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">strip</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> line </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> f </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> line</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">strip</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> __name__ </span><span class="token operator" style="color:#393A34">==</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"__main__"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    paragraphs </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> parse_paragraph</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"about_me.txt"</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<p>這個 <code>parse_paragraph()</code> 的功能其實滿簡單的，就是把 <code>about_me.txt</code> 檔案讀進來，去除多餘的空白行，並且讓每一行組裝成一個串列回傳回來。接著再借用 Ollama 套件幫我們把這個串列的每個元素轉換成向量：</p>
<div class="language-py codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockTitle_JhIl">檔案：rag_v1.py</div><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-py codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> ollama</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># ... 略 ...</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">calc_embedings</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">paragraphs</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        ollama</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">embeddings</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"mistral"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> prompt</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">data</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"embedding"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> data </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> paragraphs</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> __name__ </span><span class="token operator" style="color:#393A34">==</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"__main__"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    paragraphs </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> parse_paragraph</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"about_me.txt"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line theme-code-block-highlighted-line" style="color:#393A34"><span class="token plain">    embeddings </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> calc_embedings</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">paragraphs</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<p>把剛才處理好的段落傳給 <code>calc_embedings()</code> 函數，最終會得到一個計算好的向量串列。這裡我用了 <code>mistral</code> 模型來進行計算，如果要換成其它模型也可以。執行之後應該會發現，雖然 <code>about_me.txt</code> 的文字並不多，但跑起來會有明顯的卡頓感，以我 2021 年的 M1 Mac 筆記大概都要跑二、三秒才會跑完，這是因為計算這些向量就是需要時間（或需要更厲害的硬體），而且這個卡頓感可能會隨著餵食的資料越多會越明顯。這裡不能總是每次都這樣重算，所以接下來我會寫一個函數，在第一次計算完之後會先把計算好的向量存下來，下次如果還是算同一個檔案就不用再重算：</p>
<div class="language-py codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockTitle_JhIl">檔案：rag_v1.py</div><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-py codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> ollama</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> os</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> json</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># ... 略 ...</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">cache_embeddings</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">filename</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> paragraphs</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    embedding_file </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string-interpolation string" style="color:#e3116c">f"cache/</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">filename</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">.json"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> os</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">path</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">isfile</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">embedding_file</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">with</span><span class="token plain"> </span><span class="token builtin">open</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">embedding_file</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">as</span><span class="token plain"> f</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> json</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">load</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">f</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    os</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">makedirs</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"cache"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> exist_ok</span><span class="token operator" style="color:#393A34">=</span><span class="token boolean" style="color:#36acaa">True</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    embeddings </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> calc_embedings</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">paragraphs</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">with</span><span class="token plain"> </span><span class="token builtin">open</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">embedding_file</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"w"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">as</span><span class="token plain"> f</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        json</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">dump</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">embeddings</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> f</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> embeddings</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> __name__ </span><span class="token operator" style="color:#393A34">==</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"__main__"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    paragraphs </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> parse_paragraph</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"about_me.txt"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line theme-code-block-highlighted-line" style="color:#393A34"><span class="token plain">    embeddings </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> cache_embeddings</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"about_me.txt"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> paragraphs</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<p>這個 <code>cache_embeddings()</code> 函數大概就是把計算好的向量存起來，如果檔案已經存在就把它讀出來。但這個函數其實有點問題，就是如果 <code>about_me.txt</code> 的內容有調整的話，除非我把 <code>cache/</code> 目錄裡的資料刪掉，不然不會重新計算。不過沒關係，這一段的手工打造的目的只是展示原理，晚點你會看到我用其它套件在處理的時候就不用煩惱這種瑣事，所以暫時先接受這個不完美。</p>
<p>第一次執行應該還是一樣會卡頓，但第二次執行之後，應該很明顯一下子就跑完了。</p>
<p>最後再補一個用來計算向量相似度的函數 <code>calc_similar_vectors()</code>：</p>
<div class="language-py codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockTitle_JhIl">檔案：rag_v1.py</div><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-py codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> numpy </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> linalg</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> dot</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># ... 略 ...</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">calc_similar_vectors</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">v</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> vectors</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    v_norm </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> linalg</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">norm</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">v</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    scores </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">dot</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">v</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> item</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">/</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">v_norm </span><span class="token operator" style="color:#393A34">*</span><span class="token plain"> linalg</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">norm</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">item</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> item </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> vectors</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> </span><span class="token builtin">sorted</span><span class="token punctuation" style="color:#393A34">(</span><span class="token builtin">enumerate</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">scores</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> reverse</span><span class="token operator" style="color:#393A34">=</span><span class="token boolean" style="color:#36acaa">True</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> key</span><span class="token operator" style="color:#393A34">=</span><span class="token keyword" style="color:#00009f">lambda</span><span class="token plain"> x</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> x</span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">1</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># ... 略 ...</span><br></span></code></pre></div></div>
<p>這個函數裡有些數學計算可能稍微複雜一點，基本上就是用來計算在向量們的距離。如前面提到的，向量越相似，表示這些向量的資料也越相似。這個函數回傳一個由依照相似度分數排序的索引值組成的串列，這樣我們就可以知道哪個向量最相似。</p>
<p>到這裡，前置工作就差不多準備好了，再來就是把所有的東西組合起來：</p>
<div class="language-py codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockTitle_JhIl">檔案：rag_v1.py</div><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-py codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> __name__ </span><span class="token operator" style="color:#393A34">==</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"__main__"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    doc </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"about_me.txt"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    paragraphs </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> parse_paragraph</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">doc</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    embeddings </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> cache_embeddings</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">doc</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> paragraphs</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    prompt </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token builtin">input</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"請問你想問什麼問題？\n&gt;&gt;&gt; "</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">while</span><span class="token plain"> prompt</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">lower</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">!=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"bye"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        prompt_embedding </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> ollama</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">embeddings</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"mistral"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> prompt</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">prompt</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token string" style="color:#e3116c">"embedding"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        similar_vectors </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> calc_similar_vectors</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">prompt_embedding</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> embeddings</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">:</span><span class="token number" style="color:#36acaa">3</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        system_prompt </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token string" style="color:#e3116c">"現在開始使用我提供的情境來回答，只能使用繁體中文，不要有簡體中文字。如果你不確定答案，就說不知道。情境如下："</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token operator" style="color:#393A34">+</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"\n"</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">join</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">paragraphs</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">vector</span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> vector </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> similar_vectors</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        response </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> ollama</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">chat</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"mistral"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            messages</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                </span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"role"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"system"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"content"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> system_prompt</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                </span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"role"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"user"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"content"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> prompt</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">response</span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"message"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"content"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        prompt </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token builtin">input</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"&gt;&gt;&gt; "</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<p>來執行看看：</p>
<div class="language-text codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-text codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token plain">$ python rag_v1.py</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">您想問什麼問題？</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">&gt;&gt;&gt; 高見龍是誰？</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">高見龍是一名網站開發者、講師、技術書作者、企業內訓及技術顧問。他是五倍學院（五倍紅寶石程式資訊教育股份有限公司）的負責人，並曾為 WebConf Taiwan 研討會發起人和主辦人，同時也是 PHPConf Taiwan 研討會發起人和主辦人。</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">&gt;&gt;&gt; bye</span><br></span></code></pre></div></div>
<p>Good，這看起來比剛才的詩人好多了！</p>
<p>完整程式碼：</p>
<div class="language-py codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockTitle_JhIl">檔案：rag_v1.py</div><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-py codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> ollama</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> os</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> json</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> numpy </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> linalg</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> dot</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">parse_paragraph</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">filename</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">with</span><span class="token plain"> </span><span class="token builtin">open</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">filename</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">as</span><span class="token plain"> f</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">line</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">strip</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> line </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> f </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> line</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">strip</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">calc_embedings</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">paragraphs</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        ollama</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">embeddings</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"mistral"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> prompt</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">data</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"embedding"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> data </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> paragraphs</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">cache_embeddings</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">filename</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> paragraphs</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    embedding_file </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string-interpolation string" style="color:#e3116c">f"cache/</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">filename</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">.json"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> os</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">path</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">isfile</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">embedding_file</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">with</span><span class="token plain"> </span><span class="token builtin">open</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">embedding_file</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">as</span><span class="token plain"> f</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> json</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">load</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">f</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    os</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">makedirs</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"cache"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> exist_ok</span><span class="token operator" style="color:#393A34">=</span><span class="token boolean" style="color:#36acaa">True</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    embeddings </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> calc_embedings</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">paragraphs</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">with</span><span class="token plain"> </span><span class="token builtin">open</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">embedding_file</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"w"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">as</span><span class="token plain"> f</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        json</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">dump</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">embeddings</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> f</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> embeddings</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">calc_similar_vectors</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">v</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> vectors</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    v_norm </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> linalg</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">norm</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">v</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    scores </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">dot</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">v</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> item</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">/</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">v_norm </span><span class="token operator" style="color:#393A34">*</span><span class="token plain"> linalg</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">norm</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">item</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> item </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> vectors</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> </span><span class="token builtin">sorted</span><span class="token punctuation" style="color:#393A34">(</span><span class="token builtin">enumerate</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">scores</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> reverse</span><span class="token operator" style="color:#393A34">=</span><span class="token boolean" style="color:#36acaa">True</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> key</span><span class="token operator" style="color:#393A34">=</span><span class="token keyword" style="color:#00009f">lambda</span><span class="token plain"> x</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> x</span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">1</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> __name__ </span><span class="token operator" style="color:#393A34">==</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"__main__"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    doc </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"about_me.txt"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    paragraphs </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> parse_paragraph</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">doc</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    embeddings </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> cache_embeddings</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">doc</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> paragraphs</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    prompt </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token builtin">input</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"您想問什麼問題？\n&gt;&gt;&gt; "</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">while</span><span class="token plain"> prompt</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">lower</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">!=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"bye"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        prompt_embedding </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> ollama</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">embeddings</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"mistral"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> prompt</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">prompt</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token string" style="color:#e3116c">"embedding"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        similar_vectors </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> calc_similar_vectors</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">prompt_embedding</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> embeddings</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">:</span><span class="token number" style="color:#36acaa">3</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        system_prompt </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token string" style="color:#e3116c">"現在開始使用我提供的情境來回答，只能使用繁體中文，不要有簡體中文字。如果你不確定答案，就說不知道。情境如下："</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token operator" style="color:#393A34">+</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"\n"</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">join</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">paragraphs</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">vector</span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> vector </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> similar_vectors</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        response </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> ollama</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">chat</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"mistral"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            messages</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                </span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"role"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"system"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"content"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> system_prompt</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                </span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"role"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"user"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"content"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> prompt</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">response</span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"message"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"content"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        prompt </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token builtin">input</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"&gt;&gt;&gt; "</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<h3 class="anchor anchorWithStickyNavbar_XzRG" id="使用-langchain">使用 LangChain<a href="https://pythonbook.cc/articles/2024/9/11/rag-workshop#%E4%BD%BF%E7%94%A8-langchain" class="hash-link" aria-label="使用 LangChain的直接連結" title="使用 LangChain的直接連結">​</a></h3>
<p>上面這樣的手工打造的目的主要是理解 RAG 的原理，但實際做的時候不用這麼辛苦啦，Python 世界有很多厲害的善心人士幫我們做好了現成的套件，踩在這些巨人的肩膀上工作比較有效率。接下來就來看看怎麼使用 LangChain 幫我們更快搞定這些事。</p>
<p>不想建立新的虛擬環境的話，可在同一個目錄底下執行安裝套件：</p>
<div class="language-text codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-text codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token plain">$ pip install langchain langchain_community langchain_chroma pypdf</span><br></span></code></pre></div></div>
<p>langchain 開頭的套件大概能猜出是為什麼要安裝，而最後的 <code>pyddf</code> 套件是晚點用到它解讀 PDF 用，可以順便一起裝一下。建立一個新的檔案 <code>rag_v2.py</code>，我們從頭開始寫：</p>
<div class="language-py codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockTitle_JhIl">檔案：rag_v2.py</div><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-py codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> langchain_community</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">llms </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> Ollama</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> langchain_community</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">document_loaders </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> TextLoader</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">llm </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> Ollama</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"mistral"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">loader </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> TextLoader</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"about_me.txt"</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<p>這裡使用了 Langchain 社群提供的第三方套件來建立 LLM 物件，同時例用 <code>TextLoader</code> 來讀取 <code>about_me.txt</code> 的內容，就不用像我們前面自己手動呼叫 <code>open()</code> 函數讀取檔案。接著，我們來把這些文字分割成小段落：</p>
<div class="language-py codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockTitle_JhIl">檔案：rag_v2.py</div><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-py codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> langchain</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">text_splitter </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> RecursiveCharacterTextSplitter</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># ... 略 ...</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">text_splitter </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> RecursiveCharacterTextSplitter</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    chunk_size</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">20</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    chunk_overlap</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">5</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    separators</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">" "</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">","</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"\n"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">splited_docs </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> text_splitter</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">split_documents</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">loader</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">load</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<p>這裡借用了 <code>RecursiveCharacterTextSplitter</code> 來把文字分割成小段落，這個做法跟前面我們自己手寫的 <code>parse_paragraph()</code> 函數差不多概念，但功能強多了。再來，把這些段落轉換成向量：</p>
<div class="language-py codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockTitle_JhIl">檔案：rag_v2.py</div><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-py codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> langchain_community</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">embeddings </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> OllamaEmbeddings</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># ... 略 ...</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">embeddings </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> OllamaEmbeddings</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"nomic-embed-text"</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<p>這個步驟跟前面自己寫的 <code>calc_embedings()</code> 函數差不多，只是這裡我們用了 <code>OllamaEmbeddings</code> 來幫我們計算向量。這裡我改用 <code>nomic-embed-text</code> 模型是因為用這個 Model 來計算速度比較快，當然你想繼續使用 <code>mistral</code> 也沒是可以。如果前面沒有安裝 <code>nomic-embed-text</code> 模型的話，這裡會需要請 Ollama 把模型拉下來，還好這個模型比 <code>mistral</code> 小多了，大概只有二百多 MB：</p>
<div class="language-text codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-text codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token plain">$ ollama pull nomic-embed-text</span><br></span></code></pre></div></div>
<p>回來程式碼，來把這些算好的向量存起來：</p>
<div class="language-py codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockTitle_JhIl">檔案：rag_v2.py</div><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-py codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> langchain_community</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">vectorstores </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> Chroma</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic">#  ... 略 ...</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">vector_db </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> Chroma</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">from_documents</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    documents</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">splited_docs</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    embedding</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">embeddings</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    persist_directory</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"db"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    collection_name</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"about"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<p>這個步驟跟我們手刻的 <code>cache_embeddings()</code> 函數差不多，只是這裡我們用了 <code>Chroma</code> 來幫我們存放這些向量。<code>persist_directory</code> 是用來指定存放向量的目錄，我暫時把它設定成 <code>db</code>，你想改成什麼都可以，待會程式執行的時候就會看到在專案目錄底下自動生出這個目錄。而 <code>collection_name</code> 則是用來指定這些向量的集合名稱。</p>
<p>最後，把這些向量變成檢索器：</p>
<div class="language-py codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockTitle_JhIl">檔案：rag_v2.py</div><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-py codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> langchain_core</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">prompts </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> ChatPromptTemplate</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># ... 略 ...</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">retriever </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> vector_db</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">as_retriever</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">search_kwargs</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"k"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">3</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">system_prompt </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"現在開始使用我提供的情境來回答，只能使用繁體中文，不要有簡體中文字。如果你不確定答案，就說不知道。情境如下:\n\n{context}"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">prompt_template </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> ChatPromptTemplate</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">from_messages</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"system"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> system_prompt</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"user"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"問題: {input}"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<p>我們前面還在自己手刻 <code>calc_similar_vectors()</code> 函數來計算向量的相似度，這裡直接幫我們都處理好了，而且我相信一定比我們自己刻的還厲害。同時這裡使用 <code>ChatPromptTemplate</code> 來建立樣板，如果仔細看內容應該會覺得不算太陌生。</p>
<p>最後把所有的東西組裝起來，就可以來聊天了：</p>
<div class="language-py codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockTitle_JhIl">檔案：rag_v2.py</div><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-py codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> langchain</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">chains</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">combine_documents </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> create_stuff_documents_chain</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> langchain</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">chains </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> create_retrieval_chain</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># ... 略 ...</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">document_chain </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> create_stuff_documents_chain</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">llm</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> prompt_template</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">retrieval_chain </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> create_retrieval_chain</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">retriever</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> document_chain</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">context </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">input_text </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token builtin">input</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"您想問什麼問題？\n&gt;&gt;&gt; "</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">while</span><span class="token plain"> input_text</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">lower</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">!=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"bye"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    response </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> retrieval_chain</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">invoke</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"input"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> input_text</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"context"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> context</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    context </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> response</span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"context"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">response</span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"answer"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    input_text </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token builtin">input</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"&gt;&gt;&gt; "</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<p>執行看看：</p>
<div class="language-text codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-text codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token plain">python rag_v2.py</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">您想問什麼問題？</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">&gt;&gt;&gt; 高見龍是誰？</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">回答：高見龍是一位愛寫程式的電腦阿宅，並且希望可以寫一輩子程式。他是父母給他的本名，這個名字看起來有點像武俠小說的名字，但其實不是筆名。他很喜歡自己的名字。</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">&gt;&gt;&gt; 他有出版過什麼書？</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">回答: 不知道。</span><br></span></code></pre></div></div>
<p>雖然不知道出過什麼書，不知道是還不夠聰明還是提供的資料還不夠所以無法判斷出過什麼書，但還是比較亂講一通好多了。</p>
<p>完整程式碼如下：</p>
<div class="language-py codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockTitle_JhIl">檔案：rag_v2.py</div><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-py codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> langchain_community</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">llms </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> Ollama</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> langchain_community</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">document_loaders </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> TextLoader</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> langchain</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">text_splitter </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> RecursiveCharacterTextSplitter</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> langchain_community</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">embeddings </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> OllamaEmbeddings</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> langchain_community</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">vectorstores </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> Chroma</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> langchain_core</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">prompts </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> ChatPromptTemplate</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> langchain</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">chains</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">combine_documents </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> create_stuff_documents_chain</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> langchain</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">chains </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> create_retrieval_chain</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">llm </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> Ollama</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"mistral"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">loader </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> TextLoader</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"about_me.txt"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">text_splitter </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> RecursiveCharacterTextSplitter</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    chunk_size</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">20</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    chunk_overlap</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">5</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    separators</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">" "</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">","</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"\n"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">splited_docs </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> text_splitter</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">split_documents</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">loader</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">load</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">embeddings </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> OllamaEmbeddings</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"nomic-embed-text"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">vector_db </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> Chroma</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">from_documents</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    documents</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">splited_docs</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    embedding</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">embeddings</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    persist_directory</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"db"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    collection_name</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"about"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">retriever </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> vector_db</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">as_retriever</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">search_kwargs</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"k"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">3</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">system_prompt </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"現在開始使用我提供的情境來回答，只能使用繁體中文，不要有簡體中文字。如果你不確定答案，就說不知道。情境如下:\n\n{context}"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">prompt_template </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> ChatPromptTemplate</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">from_messages</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"system"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> system_prompt</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"user"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"問題: {input}"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">document_chain </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> create_stuff_documents_chain</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">llm</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> prompt_template</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">retrieval_chain </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> create_retrieval_chain</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">retriever</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> document_chain</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">context </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">input_text </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token builtin">input</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"您想問什麼問題？\n&gt;&gt;&gt; "</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">while</span><span class="token plain"> input_text</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">lower</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">!=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"bye"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    response </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> retrieval_chain</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">invoke</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"input"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> input_text</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"context"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> context</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    context </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> response</span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"context"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">response</span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"answer"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    input_text </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token builtin">input</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"&gt;&gt;&gt; "</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<h3 class="anchor anchorWithStickyNavbar_XzRG" id="讀取-pdf-檔案">讀取 PDF 檔案<a href="https://pythonbook.cc/articles/2024/9/11/rag-workshop#%E8%AE%80%E5%8F%96-pdf-%E6%AA%94%E6%A1%88" class="hash-link" aria-label="讀取 PDF 檔案的直接連結" title="讀取 PDF 檔案的直接連結">​</a></h3>
<p>剛才都是處理文字檔，但其實要處理 PDF 檔案也不難，只要換個 Loader 就可以了：</p>
<div class="language-py codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockTitle_JhIl">檔案：rag_v3.py</div><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-py codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> langchain_community</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">document_loaders </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> PyPDFLoader</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># ... 略 ...</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line theme-code-block-highlighted-line" style="color:#393A34"><span class="token plain">loader </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> PyPDFLoader</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"interview.pdf"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">splited_docs </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> loader</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">load_and_split</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<p>這裡我準備了一個 <code>interview.pdf</code> 的文件檔，並改用 <code>PyPDFLoader</code> 來讀取 PDF 檔案，並且用 <code>load_and_split()</code> 方法來把 PDF 檔案的內容分割成小段落。另外因為不想讓向資都存在同一個 collection 裡，所以我把 <code>collection_name</code> 換成 <code>interview</code>：</p>
<div class="language-py codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockTitle_JhIl">檔案：rag_v3.py</div><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-py codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># ... 略 ...</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">vector_db </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> Chroma</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">from_documents</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    documents</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">splited_docs</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    embedding</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">embeddings</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    persist_directory</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"db"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    collection_name</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"interview"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<p>執行看看：</p>
<div class="language-text codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-text codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token plain">$ python rag_v3.py</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">您想問什麼問題？</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">&gt;&gt;&gt; 什麼是快速面試？</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">快速面試是一種幫我們的培訓班學員找新工程師工作的媒合活動，旨在幫助企業找到合適的工程師，同時又能夠讓學員們找到適合的新工作機會。這個活動可線上或是實體進行，每家企業與每位求職者進行8分鐘的面試，活動結束後我們會提供通過快速面試的求職者資訊給企業。這個活動是免費進行的，需要準備相關公司和職缺資料以及對面試者的能力進行有效的評估。</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">&gt;&gt;&gt; 下次什麼時候舉辦？</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">答案：下一場快速⾯試預計在2024年10月2日（星期三）舉辦。</span><br></span></code></pre></div></div>
<p>看起來不錯！</p>
<p>完整程式碼：</p>
<div class="language-py codeBlockContainer__dxy theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockTitle_JhIl">檔案：rag_v3.py</div><div class="codeBlockContent_gluQ"><pre tabindex="0" class="prism-code language-py codeBlock_Kk3v thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_VdF_"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> langchain_community</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">llms </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> Ollama</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> langchain_community</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">document_loaders </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> PyPDFLoader</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> langchain_community</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">embeddings </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> OllamaEmbeddings</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> langchain_community</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">vectorstores </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> Chroma</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> langchain_core</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">prompts </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> ChatPromptTemplate</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> langchain</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">chains</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">combine_documents </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> create_stuff_documents_chain</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> langchain</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">chains </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> create_retrieval_chain</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">llm </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> Ollama</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"mistral"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">loader </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> PyPDFLoader</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"interview.pdf"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">splited_docs </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> loader</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">load_and_split</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">embeddings </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> OllamaEmbeddings</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"nomic-embed-text"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">vector_db </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> Chroma</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">from_documents</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    documents</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">splited_docs</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    embedding</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">embeddings</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    persist_directory</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"db"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    collection_name</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"interview"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">retriever </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> vector_db</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">as_retriever</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">search_kwargs</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"k"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">3</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">system_prompt </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"現在開始使用我提供的情境來回答，只能使用繁體中文，不要有簡體中文字。如果你不確定答案，就說不知道。情境如下:\n\n{context}"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">prompt_template </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> ChatPromptTemplate</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">from_messages</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"system"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> system_prompt</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"user"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"問題: {input}"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">document_chain </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> create_stuff_documents_chain</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">llm</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> prompt_template</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">retrieval_chain </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> create_retrieval_chain</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">retriever</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> document_chain</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">context </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">input_text </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token builtin">input</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"您想問什麼問題？\n&gt;&gt;&gt; "</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">while</span><span class="token plain"> input_text</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">lower</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">!=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"bye"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    response </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> retrieval_chain</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">invoke</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"input"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> input_text</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"context"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> context</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    context </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> response</span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"context"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">response</span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"answer"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    input_text </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token builtin">input</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"&gt;&gt;&gt; "</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>]]></content>
        <author>
            <name>高見龍</name>
            <uri>https://kaochenlong.com</uri>
        </author>
        <category label="LLM" term="LLM"/>
        <category label="Python" term="Python"/>
        <category label="RAG" term="RAG"/>
        <category label="LangChain" term="LangChain"/>
        <category label="Workshop" term="Workshop"/>
    </entry>
</feed>