Split dataframe into smaller dataframe by column Names

0

So I am doing a Time series/LSTM assignment and I have a stock dataset: https://www.kaggle.com/camnugent/sandp500

The thing is that I need to split the main dataframe into smaller dataframes by the name of each company. Is there a fast way to do this? because there are tens of Company names, I have seen that this can be done with iloc but the effort is too much.

df = pd.read_csv('all_stocks_5yr.csv', parse_dates=['date'])
df['date'] = pd.to_datetime(df['date'])

grouped_df = df.groupby('Name')

Here it can be seen better:

enter image description here

As you can see, there are different companies with different names, what I want is a dataframe for each company. Help is much appreciated

dataframe keras lstm pandas
2021-11-23 15:16:49
2

1

Assume this is your dataframe:

 Name  price
0   aal      1
1   aal      2
2   aal      3
3   aal      4
4   aal      5
5   aal      6
6   bll      7
7   bll      8
8   bll      9
9   bll      8
10  dll      7
11  dll     56
12  dll      4
13  dll      3
14  dll      3
15  dll      5

Then do the following:

for Name, df in df.groupby('Name'):
    df.to_csv("Price_{}".format(Name)+".csv", sep=";")

That'll save all sub-dataframes as csv. To view what the code does:

for Name, df in df.groupby('Name'):
    print(df)

returns:

Name  price
0  aal      1
1  aal      2
2  aal      3
3  aal      4
4  aal      5
5  aal      6
  Name  price
6  bll      7
7  bll      8
8  bll      9
9  bll      8
   Name  price
10  dll      7
11  dll     56
12  dll      4
13  dll      3
14  dll      3
15  dll      5

If you need to reset the index in every df, do this:

for Name, df in df.groupby('Name'):
    gf = df.reset_index()
    print(gf)

which gives:

index Name  price
0      0  aal      1
1      1  aal      2
2      2  aal      3
3      3  aal      4
4      4  aal      5
5      5  aal      6
   index Name  price
0      6  bll      7
1      7  bll      8
2      8  bll      9
3      9  bll      8
   index Name  price
0     10  dll      7
1     11  dll     56
2     12  dll      4
3     13  dll      3
4     14  dll      3
5     15  dll      5
2021-11-23 17:49:39

the dataset has some fields such as: open, high, low, close... How do I add them when transforming the df to csv?
eneko valero

@enekovalero You do not need to do anything else than the above code. My df was simply an example. All columns will be in every produced dataframe. It's only filtered on name. For your future questions (or if you want me to test on your actual data), do not post images. Instead do this: df.head(50).to_dict() (or any number instead of 50) and paste the result in between ``` <here> ``` in you SO-question.
Serge de Gosson de Varennes

@eneko valero...I don't think your concept makes a lot of sense, or any sense whatsoever. Can you try the concept referenced below? Probably just the first 1/4 applies to what you are doing. github.com/ASH-WICUS/Notebooks/blob/master/…
ASH
0

This should be doable with boolean indexing:

list_of_dataframes = [
    df[df.Name == name]
    for name
    in df.Name.unique()
]
2021-11-23 16:22:54

This will work, but I imagine it will be pretty slow on a large dataset because you have to compute the entire boolean series for each unique name.
Kevin Roche

In other languages

This page is in other languages

Русский
..................................................................................................................
Italiano
..................................................................................................................
Polski
..................................................................................................................
Română
..................................................................................................................
한국어
..................................................................................................................
हिन्दी
..................................................................................................................
Français
..................................................................................................................
Türk
..................................................................................................................
Česk
..................................................................................................................
Português
..................................................................................................................
ไทย
..................................................................................................................
中文
..................................................................................................................
Español
..................................................................................................................
Slovenský
..................................................................................................................