QEUR23_LMABYS15:　Llama2の精度評価（LangChain-日本語-その１）

8月 21, 2023

～　日本語もかなりイケル！！　～

D先生： “前回までは英語を中心にトライアルしてきました。しかし、Llama2のLLM（大規模言語モデル）は多国語に対応しているはずでしたよね。”

QEU:FOUNDER ： “前回の「（LCなしで）日本語だけを使う」という方法では、回答のパフォーマンスが悪くなったわけです。”

(Llama-v2-13b-GPTQ：日本語->LangChainはない)

D先生： “特に、英語の回答が多かったのでポイントを落としましたね。”

D先生： “今回は、日本語の質問に対してLangChainを使います。その結果はどうなりますかね？”

QEU:FOUNDER ： “小生もすこしワクワクします。それでは、プログラムをドン！コードは、実際には英語版とは変わらないですが・・・。”

# llama2-13B-chat-GPTQによる連続回答プログラム(LangChainつき)
import pandas as pd
import numpy as np
# LLM
from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
# LangChain
from langchain import HuggingFacePipeline, PromptTemplate, LLMChain
from langchain.chains import RetrievalQA
from langchain.document_loaders import DirectoryLoader
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma

#Load all the .txt files from docs directory
loader = DirectoryLoader('/content/drive/MyDrive/LCdocs/',glob = "**/*.txt")
documents = loader.load()
len(documents)

# -----
print(documents[0].page_content[0:1500])


# -----
# 言語情報をChunk分割する
splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=64, separators=["\n\n", "\n", "。", "，"],)
texts = splitter.split_documents(documents)
len(texts)

# -----
print(texts[30].page_content)

# -----
# モデル名
model_name_or_path = "TheBloke/h2ogpt-4096-llama2-13B-chat-GPTQ"
use_triton = False
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
        use_safetensors=True,
        trust_remote_code=False,
        device="cuda:0",
        use_triton=use_triton,
        quantize_config=None)

"""
# To download from a specific branch, use the revision parameter, as in this example:
# Note that `revision` requires AutoGPTQ 0.3.1 or later!

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
        revision="gptq-4bit-32g-actorder_True",
        use_safetensors=True,
        trust_remote_code=False,
        device="cuda:0",
        quantize_config=None)
"""

# -----
# Inference can also be done using transformers' pipeline
# Prevent printing spurious transformers error when using pipeline with AutoGPTQ
logging.set_verbosity(logging.CRITICAL)

#print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    device=0,            # GPUを使うことを指定 (cuda:0と同義)
    max_new_tokens=512,
    temperature=0.2,
    top_p=0.95,
    repetition_penalty=1.15
)



#################
##################################


model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {"device": "cuda"}

embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)
db = Chroma.from_documents(texts, embeddings, persist_directory="db")


######################
##################

llm = HuggingFacePipeline(
    pipeline = pipe, 
    model_kwargs = {"temperature": 0.2}
)

# Build prompt
template = """question（質問）に見慣れない名詞があった場合、以下のcontext(文脈)を利用して、questionのあとに答えなさい。answer(答え)がわからない場合は答えを作ろうとせず、わからないと答えなさい。answerは3文または20語以内に書きなさい。
{context}
Question: {question}
Answer:"""


QA_CHAIN_PROMPT = PromptTemplate.from_template(template)   # Run chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=db.as_retriever(search_kwargs={"k": 3}),
    return_source_documents=True,
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT},
    verbose=False,
)


#################
##################################

# -----
#Ask Simple Questions
question = "米国の最高法は何ですか?"
result = qa_chain({"query": question})
#print(result)

# -----
print(result["result"])


# -----
# print(result["source_documents"][0])
# Retrieve the page number from the source document
list_content = list(result['source_documents'][0])
#print(list_content)

# -----
# 整形する
str_content = list_content[0]
#print(str_content)

str_content = str_content[1]
print(str_content)

######################
##################


# ----
# 質問の文字列リストを定義する
df = pd.read_excel("/content/drive/MyDrive/LCdocs/QEU-verify100-ja2(en).xlsx")
#print(df)
len(df)

# 日本語のQA情報を読み込む
question_list = df["instructions(jp)"].values
#print(question_list[0:20])

reference_list = df["response(jp)"].values
#print(referece_list[0:20])

type_list = df["type"].values
print(type_list[0:20])


######################
##################

# -----
# 個別の質問応答のタスクを実行する関数を定義する
def repeat_qa(question):

    # 質問に回答する
    print(f"No.{i}: Question: {question}")

    # Ask Questions
    res = qa_chain( question )
    answer = res["result"]

    # -----
    # Retrieve the page number from the source document
    list_content = list(res['source_documents'][0])
    #print(list_content)
    # -----
    # 整形する
    arr_content = list_content[0]
    #print(arr_content)
    str_content = arr_content[1]
    #print(str_content)

    # -----
    # 答えを表示する
    print(f"Answer: {answer}")
    print(f"Content: {str_content}")
    
    return answer, str_content

# 繰り返し質問する
mx_output = []
for i in range(100):     # len(df)

    # 今回の質問を抽出する
    question  = question_list[i]
    reference = reference_list[i]
    val_type  = type_list[i]
    # 質問に回答する
    answer, str_content = repeat_qa(question)
    # 質問と回答のマトリックス（リスト）を生成する
    mx_output.append([question, answer, reference, val_type, str_content])

# リストを出力する
#print("--------")
#print("generate answer list")
#print(answer_output)
print("--------")
print("generate output matrix")
print(np.array(mx_output[0:5]))


# -----
# Create DataFrame
mx_out = np.array(mx_output)
columns1 = ["Question", "Answer", "Reference", "Type", "Context"]
df_out = pd.DataFrame(data=mx_out, columns=columns1)
print(df_out)

# EXCEL ファイル (USverify100_Llama_jp(LC).xlsx) として出力
df_out.to_excel("/content/drive/MyDrive/USverify100_Llama_jp(LC).xlsx")

QEU:FOUNDER ： “まずは質問への回答を見てみましょう。CONTEXTを見ていると、sentense-transformerが必要な情報を抽出しているとはいいがたいが・・・。”

QEU:FOUNDER ： “何事も結果がすべて、よかったよかった・・・（笑）。”

D先生： “私も、（パフォーマンスの高さに）びっくり・・・。どんな「プロンプト・エンジニアリング」を使ったんでしたっけ？”

template = """question（質問）に見慣れない名詞があった場合、以下のcontext(文脈)を利用して、questionのあとに答えなさい。answer(答え)がわからない場合は答えを作ろうとせず、わからないと答えなさい。answerは3文または20語以内に書きなさい。
{context}
Question: {question}
Answer:"""

QEU:FOUNDER ： “プロンプト・エンジニアリングを日本語ベースにしたので、LLMも日本語で答えるようになったんでしょう。それでは例によって、NG事例を確認してみましょう。Open_QAから・・・。”

D先生： “(Open_QAでは)実質上は正解のモノが多いです。少量ながら、回答を英語で出力したので不正解になっているものもあります。”

QEU:FOUNDER ： “次はClassificationのNG事例を見てみましょう。”

QEU:FOUNDER ： “はっきりいって、このLLMの日本語の言語能力は「まだまだ」だと感じました。”

D先生： “しようがないですね。あのLlama2でも、日本語コーパスのシェアは5%もないでしょう。・・・となると、この手の話題を日本語のコーパスが取り上げている可能性はほとんどないわけで・・・。”

QEU:FOUNDER ： “はっきり言って、今回の成績には小生はかなり満足しているのだが、もうすこしだけ上の目指してみましょう。次はContextの中身を改造します。”

D先生： “どうやって？”

QEU:FOUNDER ： “それは次回のお楽しみ・・・。いよいよ、QEUも「マルチリンガル」の世界に入ります。”

～　まとめ　～

QEU:FOUNDER ： “例のMMT（現代貨幣理論）の話を補足したい。”

C部長 : “あれって、「J国スゴイ」がベースとなっている理論でしょ？それゆえに、今現在、「論理的に行き詰まっている」という・・・。”

QEU:FOUNDER ： “実は、そのお手本のA国も行き詰っているのですよ。”

C部長 : “石油のドル覇権の問題ですよね。これが現実化すると、A国もMMTどころではないです。”

QEU:FOUNDER ： “間違ってはいけないのは、これはC国の陰謀ではなく、中東をはじめとしたグローバル・サウスの自身の判断なんですよ・・・。”

C部長 : “時代の大きな節目なんでしょうね。”

このブログを検索

QEU_ROUND2-3：　大規模言語モデル上にQEUシステムを構築する

QEUR23_LMABYS15:　Llama2の精度評価（LangChain-日本語-その１）

～　日本語もかなりイケル！！　～

～　まとめ　～

コメント

コメントを投稿

このブログの人気の投稿

QEUR23_LMABYS2:　閑話休題～ブログをつくるとわかること

QEUR23_LMABYS7: LangChainで本格的に性能を評価する

QEUR23_LMABYS16:　Llama2の精度評価（LangChain-日本語-その2）

QEUR23_LMABYS15: Llama2の精度評価（LangChain-日本語-その１）

～ 日本語もかなりイケル！！ ～

～ まとめ ～

コメント

コメントを投稿

このブログの人気の投稿

QEUR23_LMABYS2: 閑話休題～ブログをつくるとわかること

QEUR23_LMABYS7: LangChainで本格的に性能を評価する

QEUR23_LMABYS16: Llama2の精度評価（LangChain-日本語-その2）

QEUR23_LMABYS15:　Llama2の精度評価（LangChain-日本語-その１）

～　日本語もかなりイケル！！　～

～　まとめ　～

QEUR23_LMABYS2:　閑話休題～ブログをつくるとわかること

QEUR23_LMABYS16:　Llama2の精度評価（LangChain-日本語-その2）