Piotr Parkitny

Logo

Explore my data-driven projects and analyses that cover everything from machine learning to data visualization.

View the Project on GitHub pparkitn/pparkitn

How to Create a New Dataset on Kaggle

Creating a new dataset on Kaggle is a straightforward process. Follow these steps to create and upload your dataset to Kaggle. You can use this guide to create a new dataset or update an existing one.

Create a New Dataset

  1. Prepare Your Dataset Files: Organize your dataset files into a folder on your local machine. This folder should contain all the files you want to upload to Kaggle.

  2. Install Kaggle Python Package: If you haven’t already installed the Kaggle Python package, run the following command to install it using pip:

   pip3 install kaggle

nitialize Dataset: Use the following command to initialize your dataset and generate a metadata file:

kaggle datasets init -p /path/to/dataset

Replace /path/to/dataset with the actual path to your dataset folder. Add Metadata: Open the generated metadata file named datapackage.json and add your dataset’s metadata. This includes the dataset’s title, description, and other relevant information.

{
  "title": "INSERT_TITLE_HERE",
  "id": "dsptlp/INSERT_SLUG_HERE",
  "licenses": [
    {
      "name": "CC0-1.0"
    }
  ]
}

If you have folers in your dataset zip them up so only files are present, no folders.

zip -r name.zip filder

Create the Dataset: Now, run the following command to create your dataset on Kaggle:

kaggle datasets create -p /path/to/dataset

Kaggle will process your dataset and make it private by default. If you want to make it public right away, you can add the -u flag to the kaggle datasets create command.

Public or Collaborative Sharing: If your dataset is private and you wish to make it public or share it with collaborators, you can do so from your dataset’s settings on the Kaggle website. Navigate to “Settings” > “Sharing” to manage your dataset’s visibility and access.

Create a New Dataset Version

If you already have an existing dataset on Kaggle and want to upload a new version of it, follow these steps:

Initialize New Version: If you don’t have a metadata file for the new version, create one by running:

kaggle datasets init -p /path/to/dataset

Ensure that the id field in the dataset-metadata.json or datapackage.json points to your dataset.

Upload New Version: Run the following command to create a new version of your dataset:

kaggle datasets version -p /path/to/dataset -m "Your message here"

Replace /path/to/dataset with the actual path to your dataset folder and provide a brief message describing the changes you made for this version.

That’s it! You’ve successfully created a new dataset on Kaggle or uploaded a new version of an existing one. You can manage and share your datasets through Kaggle’s platform as needed.