Home

Published

- 3 min read

Chatbot LLMs Gatotkaca.AI #Part 2.1

img of Chatbot LLMs Gatotkaca.AI #Part 2.1

Goal

At this stage, our goal is to build a knowledge-based database that will serve as the foundation for our LLMs. A knowledge base in the context of Large Language Models (LLMs) refers to a structured repository of information that the model can access or be trained on to enhance its ability to generate accurate, relevant, and contextually appropriate responses. In short terms, knowledge base is the central information for our LLMs to interact with the user.

Step

  1. Pull data from the API using requests
  2. Normalize the data using numpy and pandas
  3. Load to our databases

Actions

API which will be our foundation of information knowledge based is from Open-Meteo. Open-Meteo is an open source weather API which is free for non-commercial use. To grab our dataset informations of weather, we need to specify the longitude and latitude location.

Longitude and latitude are coordinates used to specify locations on Earth’s surface.

  • Latitude measures how far north or south a point is from the Equator, ranging from 0° at the Equator to 90° at the poles.
  • Longitude measures how far east or west a point is from the Prime Meridian, ranging from 0° at the Prime Meridian to 180° east or west.

Before we started to GET the data longitude and latitude, we need to load our province capital in the previous part using this command

   import numpy as np

# Load file
indProv = np.load('location file.npy', allow_pickle = True)

In the next step, we create a new function called get_json_from_capital to load json data from the API URL.

   import requests
def get_json_from_capital(value):
  url = f"https://geocoding-api.open-meteo.com/v1/search?name={value}&count=1&language=en&format=json"
  response = requests.get(url)
  return response.json()

To implement the function get_json_from_capital add this code

   result_json = []
for i in indProv:
  result_json.append(get_json_from_capital(i['ibukota']))

Flow our process on the code above is like this

To combine the variable result_json with our master province capital variable (indProv), we need to iterate the object and add key inside the indProv.

   for index in range(len(result_json)):
  json_string = json.dumps(result_json[index])
  data = json.loads(json_string)
  if 'results' in data and len(data['results']) > 0:
    res = data['results'][0]
    country = res['country']
    latitude = res['latitude']
    longitude = res['longitude']

    indProv[index]['country'] = country
    indProv[index]['latitude'] = latitude
    indProv[index]['longitude'] = longitude
  else:
    print(index)


Don’t give up—if you feel confused or don’t fully understand, it’s a sign that you’re still learning and making an effort to grasp it.

The range date we want to get the information weather API is in range between 1st January 2023 and 17 August 2024. Create a new function called get_weather_from_longlat.

   import requests
def get_weather_from_longlat(longitude, latitude):
  url = f"https://api.open-meteo.com/v1/forecast?latitude={latitude}&longitude={longitude}&daily=weather_code,temperature_2m_max,temperature_2m_min,sunrise,sunset,daylight_duration,sunshine_duration,wind_speed_10m_max,wind_direction_10m_dominant&timezone=Asia%2FBangkok&start_date=2023-01-01&end_date=2024-08-18"
  response = requests.get(url)
  return response.json()

Then, we can implement that function using code below. We iterate the data indProv then, create a new file .json to store the result information from the API Open-Meteo.

   import time

for index in range(len(indProv)):
  weather_data = get_weather_from_longlat(indProv[index]['longitude'], indProv[index]['latitude'])
  capital = indProv[index]['ibukota']
  with open(f'temp/json_history/{index+1}_{capital}.json', 'w') as json_file:
    json.dump(weather_data, json_file, indent = 4)

  # delay for 90 seconds
  time.sleep(90)