Split dataframe into smaller dataframe by column Names

Question 1

So I am doing a Time series/LSTM assignment and I have a stock dataset: https://www.kaggle.com/camnugent/sandp500

The thing is that I need to split the main dataframe into smaller dataframes by the name of each company. Is there a fast way to do this? because there are tens of Company names, I have seen that this can be done with iloc but the effort is too much.

df = pd.read_csv('all_stocks_5yr.csv', parse_dates=['date'])
df['date'] = pd.to_datetime(df['date'])

grouped_df = df.groupby('Name')

Here it can be seen better:

As you can see, there are different companies with different names, what I want is a dataframe for each company. Help is much appreciated

Question 2

Assume this is your dataframe:

 Name  price
0   aal      1
1   aal      2
2   aal      3
3   aal      4
4   aal      5
5   aal      6
6   bll      7
7   bll      8
8   bll      9
9   bll      8
10  dll      7
11  dll     56
12  dll      4
13  dll      3
14  dll      3
15  dll      5

Then do the following:

for Name, df in df.groupby('Name'):
    df.to_csv("Price_{}".format(Name)+".csv", sep=";")

That'll save all sub-dataframes as csv. To view what the code does:

for Name, df in df.groupby('Name'):
    print(df)

returns:

Name  price
0  aal      1
1  aal      2
2  aal      3
3  aal      4
4  aal      5
5  aal      6
  Name  price
6  bll      7
7  bll      8
8  bll      9
9  bll      8
   Name  price
10  dll      7
11  dll     56
12  dll      4
13  dll      3
14  dll      3
15  dll      5

If you need to reset the index in every df, do this:

for Name, df in df.groupby('Name'):
    gf = df.reset_index()
    print(gf)

which gives:

index Name  price
0      0  aal      1
1      1  aal      2
2      2  aal      3
3      3  aal      4
4      4  aal      5
5      5  aal      6
   index Name  price
0      6  bll      7
1      7  bll      8
2      8  bll      9
3      9  bll      8
   index Name  price
0     10  dll      7
1     11  dll     56
2     12  dll      4
3     13  dll      3
4     14  dll      3
5     15  dll      5

Question 3

This should be doable with boolean indexing:

list_of_dataframes = [
    df[df.Name == name]
    for name
    in df.Name.unique()
]

Serge de Gosson de Varennes · Answer 1 · 2021-11-23T17:49:39