Análisis cualitativo y cuantitivo de la base de datos de DOAJ con R

Romina De León y Gimena del Rio

(HDLAB CONICET)

Diseñado y mantenido por Romina De León

Objetivo

Esta notebook realiza análisis y visualizaciones en tiempo real de la base de datos de las publicaciones de DOAJ.

Librerías a utilizar

Cada una de estas librerías se utilizan para realizar análisis estadístico y visualizaciones de la base de datos generada por DOAJ en formato CSV.

# URL oficial de DOAJ
url_doaj <- "https://doaj.org/csv"
# Descargar del CSV
temp <- tempfile(fileext = ".csv")
download.file(url_doaj, destfile = temp, method = "libcurl")

# Lectura de la BBDD
journal <- read.csv(temp, stringsAsFactors = FALSE)

# transformación en tibble
journal <- as_tibble(journal)
head(journal)
# A tibble: 6 × 51
  Journal.title Journal.URL URL.in.DOAJ When.did.the.journal…¹ Alternative.title
  <chr>         <chr>       <chr>                        <int> <chr>            
1 Nyimak: Jour… http://jur… https://do…                   2017 ""               
2 MCBS (Molecu… https://ce… https://do…                   2017 ""               
3 Acta Univers… https://ka… https://do…                   2014 "AUC Philologica"
4 Cuadernos pa… https://re… https://do…                   2016 "CILH"           
5 Enfances, Fa… https://jo… https://do…                   2014 ""               
6 RUDN Journal… http://jou… https://do…                   2008 "Vestnik Rossijs…
# ℹ abbreviated name:
#   ¹​When.did.the.journal.start.to.publish.all.content.using.an.open.license.
# ℹ 46 more variables: Journal.ISSN..print.version. <chr>,
#   Journal.EISSN..online.version. <chr>, Keywords <chr>,
#   Languages.in.which.the.journal.accepts.manuscripts <chr>, Publisher <chr>,
#   Country.of.publisher <chr>, Other.organisation <chr>,
#   Country.of.other.organisation <chr>, Journal.license <chr>, …

Selección de columnas de la BBDD

head(journal[, c("Journal.title", "Subjects", "Keywords")], 15)
# A tibble: 15 × 3
   Journal.title                                              Subjects  Keywords
   <chr>                                                      <chr>     <chr>   
 1 Nyimak: Journal of Communication                           Language… communi…
 2 MCBS (Molecular and Cellular Biomedical Sciences)          Medicine… biomedi…
 3 Acta Universitatis Carolinae: Philologica                  Language… philolo…
 4 Cuadernos para la Investigación de la Literatura Hispánica Language… spanish…
 5 Enfances, Familles, Générations                            Geograph… gender …
 6 RUDN Journal of Political Science                          Politica… politic…
 7 Éducation et Socialisation                                 Educatio… educati…
 8 RUDN Journal of Russian History                            History … russian…
 9 Turkish Journal of Bioscience and Collections              Science:… bioscie…
10 The Rehabilitation Journal                                 Social S… speech-…
11 Transitare                                                 Social S… tourism…
12 European Journal of Biology                                Science:… biology…
13 Revue d'ethnoécologie                                      Geograph… anthrop…
14 Cultura, Educación,  Sociedad                              Educatio… human s…
15 Tarih Dergisi                                              History … history 
print(paste("Total de publicaciones en la base de datos de DOAJ:", length(journal$Journal.title)))
[1] "Total de publicaciones en la base de datos de DOAJ: 22431"

Manipulación y limpieza de la base de datos

Para poder renombras las columnas

colnames(journal)
 [1] "Journal.title"                                                              
 [2] "Journal.URL"                                                                
 [3] "URL.in.DOAJ"                                                                
 [4] "When.did.the.journal.start.to.publish.all.content.using.an.open.license."   
 [5] "Alternative.title"                                                          
 [6] "Journal.ISSN..print.version."                                               
 [7] "Journal.EISSN..online.version."                                             
 [8] "Keywords"                                                                   
 [9] "Languages.in.which.the.journal.accepts.manuscripts"                         
[10] "Publisher"                                                                  
[11] "Country.of.publisher"                                                       
[12] "Other.organisation"                                                         
[13] "Country.of.other.organisation"                                              
[14] "Journal.license"                                                            
[15] "License.attributes"                                                         
[16] "URL.for.license.terms"                                                      
[17] "Machine.readable.CC.licensing.information.embedded.or.displayed.in.articles"
[18] "Author.holds.copyright.without.restrictions"                                
[19] "Copyright.information.URL"                                                  
[20] "Review.process"                                                             
[21] "Review.process.information.URL"                                             
[22] "Journal.plagiarism.screening.policy"                                        
[23] "URL.for.journal.s.aims...scope"                                             
[24] "URL.for.the.Editorial.Board.page"                                           
[25] "URL.for.journal.s.instructions.for.authors"                                 
[26] "Average.number.of.weeks.between.article.submission.and.publication"         
[27] "APC"                                                                        
[28] "APC.information.URL"                                                        
[29] "APC.amount"                                                                 
[30] "Journal.waiver.policy..for.developing.country.authors.etc."                 
[31] "Waiver.policy.information.URL"                                              
[32] "Has.other.fees"                                                             
[33] "Other.fees.information.URL"                                                 
[34] "Preservation.Services"                                                      
[35] "Preservation.Service..national.library"                                     
[36] "Preservation.information.URL"                                               
[37] "Deposit.policy.directory"                                                   
[38] "URL.for.deposit.policy"                                                     
[39] "Persistent.article.identifiers"                                             
[40] "Does.the.journal.comply.to.DOAJ.s.definition.of.open.access."               
[41] "Continues"                                                                  
[42] "Continued.By"                                                               
[43] "LCC.Codes"                                                                  
[44] "Subscribe.to.Open"                                                          
[45] "Mirror.Journal"                                                             
[46] "Open.Journals.Collective"                                                   
[47] "Subjects"                                                                   
[48] "Added.on.Date"                                                              
[49] "Last.updated.Date"                                                          
[50] "Number.of.Article.Records"                                                  
[51] "Most.Recent.Article.Added"                                                  

Creación nuevo tibble para manipular la información, según criterios de análisis

journal.select <- journal%>% select(Journal.title, Country.of.publisher, Languages.in.which.the.journal.accepts.manuscripts, Journal.license, Publisher, Review.process, Subjects, APC, Persistent.article.identifiers, Keywords, Added.on.Date)

Renombrar y simplificar nombre de columnas

journal.select <- journal.select %>%
  rename(title = Journal.title) %>%
  rename(country = Country.of.publisher) %>%
  rename(language = Languages.in.which.the.journal.accepts.manuscripts) %>%
  rename (License = Journal.license) %>%
  rename (Review = Review.process) %>%
  rename (Ids = Persistent.article.identifiers) %>%
  rename (Added = Added.on.Date)

Conversión de la columna Added.on.Date en formato año-mes-día

journal.select$Added <- as.Date(journal.select$Added) 

Limpieza de la columna “Subjects”

Se realiza una limpieza y selección de las primeras dos palabras, dentro de los temas de las publicaciones para falicitar manipulación, análisis y visualizaciones

journal.select$Subjects <- str_extract(journal.select$Subjects, "\\w+(?:[^\\w]+\\w+){0,1}")
#eliminación signos de puntuación
journal.select$Subjects <- gsub("[[:punct:]]", "", journal.select$Subjects)
journal.select <- journal.select %>%
  mutate(country = trimws(country),
         country = case_when(
      str_detect(country, "Bolivia")   ~ "Bolivia",
      str_detect(country, "Venezuela") ~ "Venezuela",
      str_detect(country, "Russian")   ~ "Russia",
      str_detect(country, "Iran")      ~ "Iran",
      str_detect(country, "Korea")     ~ "Korea",
      str_detect(country, "Moldova")   ~ "Moldova",
      str_detect(country, "Congo")     ~ "Congo",
      str_detect(country, "Tanzania")  ~ "Tanzania",
      str_detect(country, "Palestine") ~ "Palestine",
      TRUE ~ country)) %>%
  mutate(Subjects = str_trim(Subjects)) %>%
  mutate(Subjects = sapply(Subjects, function(x) {
    words <- str_split(x, "\\s{2,}|,|\\s*\\band\\b\\s*|\\s+")[[1]]  # dividir por palabras, "and", comas, o espacios
    words <- unique(words[words != ""])  # eliminar vacíos y duplicados
    paste(words, collapse = " ")
  }))

Limpieza columnas “Language” e “Ids”, identificadores persistentes

Se realiza una función para que organice los idiomas/ids, que permite facilitar las visualizaciones y la manipulación del tibble.

# Crear una función para ordenar los idiomas en una lista
sort_columns <- function(column_list) {
  sorted_columns <- sort(unlist(strsplit(column_list, ", ")))
  return(paste(sorted_columns, collapse = ", "))
}
# Aplicar la función a cada celda en la columna 'language'
journal.select <- journal.select %>%
  mutate(across(c(language, Ids), ~ sapply(.x, sort_columns)))

Función caption

# Función caption
#an_actual <- format(Sys.Date(), "%Y")
add_caption <- function(author = "Romina De León y Gimena del Rio", year = format(Sys.Date(), "%Y")) {
  paste0("Citar como: ", author, ", ", year, 
         ". Análisis de revistas latinoamericanas en DOAJ.")
}

Publicaciones totales en DOAJ

Creación de un df con porcentajes y cantidades de publicaciones por países

porcen_journal <- journal.select %>%
  group_by(country)%>%
  count()%>%
  ungroup()%>%
  mutate(percentage= round(n / sum(n) * 100, 2)) %>%
  bind_rows(data.frame(country = "Total", n = NA, percentage = sum(.$percentage)))
porcen_journal[order(porcen_journal$n, decreasing = TRUE),] 
# A tibble: 141 × 3
   country            n percentage
   <chr>          <int>      <dbl>
 1 Indonesia       2612      11.6 
 2 United Kingdom  2255      10.0 
 3 Brazil          1456       6.49
 4 United States   1304       5.81
 5 Iran            1043       4.65
 6 Spain           1004       4.48
 7 Poland           964       4.3 
 8 Switzerland      795       3.54
 9 Russia           649       2.89
10 Türkiye          645       2.88
# ℹ 131 more rows

Visualizaciones

Este apartado estará dividido entre gráficas de publicaciones de todo el mundo y de América Latian

Gráfico de georreferenciación de las publicaciones

p1 <- suppressWarnings(
ne_countries(scale = "large", returnclass = "sf") %>% 
left_join(
    porcen_journal %>%
    filter(!country %in% c("Total")) %>%
      mutate(
             country = trimws(country),
             country_std = case_when(
                country %in% c("United States", "USA") ~ "United States of America",
                TRUE ~ countrycode(country, origin = "country.name", destination = "country.name")
                ),
             country_std = coalesce(country_std, country),
             color_point = case_when( n <= 20 ~ "#91A1AF",
                                      n <= 50 ~ "#21BCFF",
                                      n <= 100 ~ "#F5276C",
                                      n <= 200 ~ "#2C0995",
                                      n <= 300 ~ "#27F5B0",
                                      n <= 450 ~ "#009E3F",
                                      n <= 600 ~ "#2e86c1",
                                      n <= 800 ~ "#1b4f72",
                                      n <= 1000 ~ "#884ea0",
                                      n <= 1250 ~ "#a569bd",
                                      n <= 1500 ~ "#af7ac5",
                                      n <= 1750 ~ "#d98880",
                                      n <= 2500 ~ "#8965F6",
                                      n <= 3500 ~"#729509",
                                      TRUE ~ "darkblue"
    )
      ),
    by = c("name" = "country_std")
  ) %>% 
  filter(!is.na(geometry)) %>%
  mutate(point_geom = st_point_on_surface(geometry),
         tooltip_text = paste0("<strong>", name, "</strong><br/>Revistas: ", n)) %>%
  ggplot() +
  geom_sf(fill = "gray90", color = "white", size = 0.1) +
  geom_point_interactive(
    aes(
      geometry = point_geom, 
      size = n,
     color = color_point,
     tooltip = tooltip_text,
     data_id = name
    ),
    stat = "sf_coordinates",
    alpha = 0.7
  ) +
  scale_size_continuous(range = c(2, 10), guide = "none") +
  scale_color_identity(
  name = "Rangos de n",
  guide = "legend",
  breaks = c("#91A1AF", "#21BCFF", "#F5276C", "#2C0995",
             "#27F5B0", "#009E3F", "#2e86c1", "#1b4f72",
             "#884ea0", "#a569bd", "#af7ac5", "#d98880",
             "#8965F6", "#729509", "darkblue"),
  labels = c("≤ 20", "21–50", "51–100", "101–200",
             "201–300", "301–450", "451–600", "601–800",
             "801–1000", "1001–1250", "1251–1500", "1501–1750",
             "1751–2500", "2501–3500", "> 3500")
)
 +
  labs(
    title = "Distribución de revistas por país",
      x = NULL,
      y = NULL,
    caption = add_caption()
  ) +
  theme_minimal() +
  theme(plot.title = element_text(size = 18, face = "bold"),
  legend.title = element_text(size = 14, face = "bold"),
  legend.text = element_text(size = 12),
  legend.position = "bottom"))

girafe(
  ggobj = p1,
  options = list(
    opts_hover(css = "fill-opacity:1;stroke:black;stroke-width:1pt;"),
    opts_tooltip(css = "background-color:white;color:black;padding:5px;border-radius:5px;font-family:sans-serif;"),
    opts_toolbar(saveaspng = TRUE)
  ),
  width_svg = 10,
  height_svg = 7
)

Serie de tiempos según año de ingreso

p2 <- journal.select %>%
  mutate( year_month = floor_date(Added, "year")) %>%
  count(year_month) %>%
  ggplot(aes(x = year_month, y = n, group = 1)) +
  geom_line(color = "#2c0fb1", size = 0.6) +
  geom_point(
    aes(
      text = paste(
        "Año:", format(year_month, "%Y"),
        "<br>Incorporaciones:", n
      )
    ),
    color = "#f93b20", size = 1) +
  theme_bw(base_size = 12) +
  scale_x_date(date_labels = "%Y", date_breaks = "2 year") +
  labs(x = "Trimestres del año", y = "Cantidad de incorporaciones", 
       title = "Línea de tiempo sobre incorporación de publicaciones a DOAJ", caption = add_caption()) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
       axis.text.y = element_text(hjust = 1)
       )
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
Warning in geom_point(aes(text = paste("Año:", format(year_month, "%Y"), :
Ignoring unknown aesthetics: text
ggplotly(p2, tooltip = "text")

Desagregado por países, según cantidad de publicaciones

p3 <- journal.select %>%
  mutate( year = floor_date(Added, "year")
  ) %>%
  count(country, year) %>% 
  group_by(country) %>%
  filter(sum(n) >= 100) %>%  
  ungroup() %>% 
  ggplot(aes(x = year, y = country, fill = n)) +
  geom_tile_interactive(
    aes(
      # Tooltip con información detallada
      tooltip = paste0(
        "<b>País:</b> ", country, "<br>",
        "<b>Año:</b> ", format(year, "%Y"), "<br>",
        "<b>Cantidad:</b> ", n
      ),
      # data_id único para cada celda (combinación país-año)
      data_id = paste0(country, year)
    ),
    color = "white" # Bordes blancos entre celdas
  ) +
  
  scale_fill_viridis_c(option = "C") +
  labs(
    x = "Año de incorporación a DOAJ",
    y = "País",
    title = "Registro de incorporación de publicaciones por país y año",
    fill = "Cantidad",
    caption = tryCatch(add_caption(), error = function(e) "") 
  ) +
  theme_classic(base_size = 12) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    legend.position = "right"
  )

# Renderizado interactivo
girafe(
  ggobj = p3,
  width_svg = 10,
  height_svg = 6,
  options = list(
    opts_tooltip(css = "background-color:black; color:white; padding:5px; border-radius:5px;"),
    opts_hover(css = "stroke:black; stroke-width:2px;")))

Idiomas de publicación en todas las publicaciones de DOAJ

  • Primero gráfico: Entre 30 y 800 publicaciones por país
  • Segundo gráfico: Más de 800 publicaciones por país
p4 <- journal.select %>%
  group_by(language) %>%
  count() %>%
  filter(n >= 25 & n <= 800 ) %>%
  mutate(language = as.character(fct_reorder(language, n))) %>%  
  ggplot(aes(x = n, 
    y = language,
    fill = n)) +
    geom_col_interactive(
    aes(
      # tooltip
      tooltip = paste0("Idioma: ", language, "\nCantidad: ", n),
      data_id = language
    ),
    show.legend = FALSE
  ) +
  #  viridis
  scale_fill_viridis_c(option = "D") +
  # Títulos y captions
  labs(
    title = "Idiomas de publicación de revistas en todo el mundo",
    x = "Cantidad de revistas",
    y = "Idiomas",
    caption = tryCatch(add_caption(), error = function(e) "") 
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    panel.grid.major.y = element_blank() # Limpia lineas horizontales
  )


girafe(
  ggobj = p4,
  width_svg = 8,
  height_svg = 6,
  options = list(
    opts_hover(css = "fill-opacity: 1; stroke: black; stroke-width: 1.5px;"),
    opts_hover_inv(css = "opacity: 0.2;") # efecto de desvanecer
  )
)
p5 <- journal.select %>%
  group_by(language) %>%
  count() %>%
  filter(n >= 800 ) %>%
  mutate(language = as.character(fct_reorder(language, n))) %>% 
  ggplot(aes(x = n, y = fct_reorder(language, n), fill = n)) +
    geom_col_interactive(aes(
      tooltip = paste0("Idioma: ", language, "\nCantidad: ", n),
      data_id = language
    ),
    show.legend = FALSE ) +
  scale_fill_gradient_interactive(
  low = "aquamarine", high = "purple",
  labels = scales::comma_format(big.mark = ".", decimal.mark = ",") 
) + 
  labs(
    title = "Idiomas de publicación de revistas en todo el mundo",
    x = "Cantidad de revistas",
    y = "Idiomas",
    caption = tryCatch(add_caption(), error = function(e) "") 
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    panel.grid.major.y = element_blank())

girafe(
  ggobj = p5,
  width_svg = 8,
  height_svg = 5,
  options = list(
    opts_hover(css = "fill-opacity: 0.8; stroke: black; stroke-width: 1px;")))
#saveWidget(p5, "idiomas_pub_DOAJ_may800.html", selfcontained = FALSE)

Gráfico considerando los idiomas de publicación según cada país

p6 <- journal.select %>%
  group_by(country, language) %>%
  count() %>%
  ungroup() %>%
  filter(n >= 75 & n <= 5000) %>%
  mutate(language = fct_reorder(language, n)) %>%
  plot_ly(x = ~country, y = ~language, z = ~n,
        type = "heatmap", colors = viridisLite::viridis(100, option = "C"), 
        hovertemplate = paste(
      "País: %{y}<br>",
      "Idioma: %{x}<br>",
      "Publicaciones: %{z}<extra></extra>")
    ) %>%
  layout(title = list(
      text = "Publicaciones con discriminación de idiomas según países",
      x = 0.5,  
      xanchor = "center"
    ),
    xaxis = list(tickangle = -45, tickfont = list(size = 8)),
    yaxis = list(tickangle = -25, tickfont = list(size = 8)),
    margin = list(l = 100, r = 20, t = 50, b = 50), 
    annotations = list(
      list(
        x = 0.35, y = -0.39, 
        xref = "paper", yref = "paper",
        showarrow = FALSE,
        xanchor = "left",
        yanchor = "top",
        font = list(size = 8),
        text = add_caption()
      )
    )
  )

p6

Gráfico de APC

p14 <- journal.select %>%
  count(Publisher, APC, name = "n") %>%
  group_by(Publisher) %>%
  filter(sum(n) >= 10) %>%   # filtrar APC con al menos 10 publicaciones
  ungroup() %>% 
  ggplot(aes(area = n, subgroup = Publisher, fill = APC)) +
  geom_treemap() +
  geom_treemap_subgroup_border(color = "white", size = 1) +
  geom_treemap_text(aes(label = Publisher), place = "centre", grow = TRUE, reflow = TRUE, min.size = 1) +
  scale_fill_manual(values = c("No" = "#A6CEE3", "Yes" = "#1F78B4")) +
  labs(
    title = "Registro de APC por Editoriales",
    fill = "APC",
    caption = add_caption()
  ) +
  theme( legend.position = "right" )

p14

Publicaciones en América Latina

América Latina

Selección de datos de países de Latinoamerica para realizar análisis sobre las publicaciones

# Filtrar los países de América Latina
selected_countries <- c("Brazil", "Argentina", "Mexico", "Colombia", "Ecuador", "Costa Rica", "Cuba", "Bolivia", "Dominican Republic", "El Salvador", "Guatemala", "Honduras", "Nicaragua", "Panama", "Chile", "Paraguay", "Peru", "Uruguay", "Venezuela")
filtered_data <- porcen_journal %>% filter(country %in% selected_countries)
head(filtered_data)
# A tibble: 6 × 3
  country        n percentage
  <chr>      <int>      <dbl>
1 Argentina    404       1.8 
2 Bolivia       12       0.05
3 Brazil      1456       6.49
4 Chile        194       0.86
5 Colombia     448       2   
6 Costa Rica    75       0.33
journal.amlat <- journal.select[journal.select$country %in% selected_countries, ]

print(paste("Total de publicaciones en América Latina:", length(journal.amlat$title)))
[1] "Total de publicaciones en América Latina: 3346"
head(journal.amlat)
# A tibble: 6 × 11
  title  country language License Publisher Review Subjects APC   Ids   Keywords
  <chr>  <chr>   <chr>    <chr>   <chr>     <chr>  <chr>    <chr> <chr> <chr>   
1 Trans… Mexico  English… CC BY-… Universi… Doubl… Social … No    ""    tourism…
2 Cultu… Colomb… Spanish  CC BY   Universi… Doubl… Educati… No    "DOI" human s…
3 Retos… Cuba    Spanish  CC BY-… Centro d… Peer … Social … No    ""    territo…
4 Revis… Brazil  Portugu… CC BY-… Universi… Doubl… Educati… No    ""    adminis…
5 Coope… Colomb… English… CC BY-… Edicione… Doubl… Politic… No    "DOI" associa…
6 Memor… Brazil  Portugu… CC BY   Universi… Doubl… Philoso… No    "DOI" history…
# ℹ 1 more variable: Added <date>

Gráfico interactiva sobre la cantidad de publicaciones en América Latina

#américa latina
p8 <- hchart(
  filtered_data,
  type = "pie",
  hcaes(x = country, y = percentage), 
  dataLabels = list(enabled = TRUE),  
  showInLegend = FALSE
) %>%
  hc_title(useHTML = TRUE,
    text = paste0(
    "<b>Porcentaje de las </b>",
    sum(filtered_data$n),
    "<b> publicaciones en América Latina</b>"
  )) %>%
  hc_subtitle(useHTML = TRUE,
              text = paste0("<i>Respecto a las ", length(journal.select$title) ," publicaciones de todo el mundo</i>")) %>%
  hc_exporting(
    enabled = TRUE, 
    filename = "paises_total"
  ) %>%
  hc_tooltip(
    pointFormat = "{point.percentage:.1f}% revistas"
  ) %>%
  hc_legend(
    enabled = FALSE, 
    layout = "horizontal",
    align = "center",
    verticalAlign = "bottom",
    y = 8
  )%>%
  hc_credits(
    enabled = TRUE,
  text = add_caption(),
  href = "https://github.com/rominicky/analisis-doaj",
  itemStyle = list(fontSize = "8px", fontWeight = "normal"),
  position = list(align = "left", x = 10, y = -5)
)

p8
#| scrolled: true

print(journal.amlat$Subjects[1:15])
         Social Sciences    Education  Philosophy          Social Sciences 
       "Social Sciences"   "Education Philosophy"        "Social Sciences" 
     Education Education        Political science    Philosophy Psychology 
             "Education"      "Political science"  "Philosophy Psychology" 
     Medicine Gynecology    Philosophy Psychology          Medicine Public 
   "Medicine Gynecology"  "Philosophy Psychology"        "Medicine Public" 
     Education Education          Social Sciences   Geography Anthropology 
             "Education"        "Social Sciences" "Geography Anthropology" 
               Fine Arts    Philosophy Psychology                Education 
             "Fine Arts"  "Philosophy Psychology"              "Education" 

Vector para filtrar solo las Humanidades y Cs Sociales

social_sc <- c("Philosophy Psychology", "Geography Anthropology", "Education Theory", "Language and", "Political science", "History", "Education History", "Education Social", "History General", "History America", "Language", "Philosophy", "Bibliography Library", "Auxiliary sciences", "Education Special", "Education", "Social Sciences")

Línea de tiempo sobre la incorporación de publicaciones en América Latina

options(repr.plot.width = 14, repr.plot.height = 10)
p9 <- journal.amlat %>%
  mutate( year_month = floor_date(Added, "quarter")) %>%
  count(year_month) %>%
  ggplot(aes(x = year_month, y = n)) +
  geom_line(color = "#2c0fb1", size = 0.6) +
  geom_point(aes(
      text = paste(
        "Año:", format(year_month, "%Y"),
        "<br>Incorporaciones:", n
      )
    ),
    color = "#f93b20", size = 1) +
  theme_bw(base_size = 12) +
  scale_x_date(date_labels = "%Y", date_breaks = "1 year") +
  labs(x = "Trimestres del año", y = "Cantidad de incorporaciones", 
       title = "Línea de tiempo sobre incorporación de publicaciones de América Latina a DOAJ", caption = add_caption()) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
       axis.text.y = element_text(hjust = 1)
       )
Warning in geom_point(aes(text = paste("Año:", format(year_month, "%Y"), :
Ignoring unknown aesthetics: text
ggplotly(p9, tooltip = "text")

Gráfico de APC en América Latina

p15 <- journal.amlat %>%
  count(Publisher, APC, name = "n") %>%
  group_by(Publisher) %>%
  filter(sum(n) >= 7) %>% 
  slice_max(n = 7, order_by = n) %>%
    ungroup() %>% 
  ggplot(aes(area = n, subgroup = Publisher, fill = APC)) +
  geom_treemap() +
  geom_treemap_subgroup_border(color = "black", size = 1) +
  geom_treemap_text(aes(label = Publisher), place = "centre", grow = TRUE, reflow = TRUE, min.size = 5) +
  scale_fill_manual(values = c("No" = "#8892FC", "Yes" = "#1F78B4")) +
  labs(
    title = "Registro de APC por editoriales en América Latina",
    fill = "APC",
    caption = add_caption()
  ) +
  theme( legend.position = "right" )

p15

Publicaciones de Ciencias Sociales y Humanidades en América Latina

p10 <- journal.amlat %>%
  filter(Subjects == "Social Sciences") %>%  
  group_by(language) %>%
  count() %>%
  filter(n >= 5) %>%
  mutate(language = reorder(language, n)) %>%
  ggplot(aes(x = reorder(language, n), y = n, fill = language)) +
  geom_col_interactive(aes(
      tooltip = paste0("Idioma: ", language, "\nCantidad: ", n),
      data_id = language),
    show.legend = FALSE ) +
  theme_bw() +
  theme(legend.position = "none") + 
  scale_fill_viridis_d(option = "A") +
  labs(y = "Cantidad de publicaciones", x = "",
       title = "Revista de 'Social Science' según idiomas de publicación de América Latina",
       caption = add_caption()) +
       theme(plot.title = element_text(hjust = 1, size = 14, face = "bold"),
       axis.text.x = element_text(hjust = 1, size = 12),
       axis.text.y = element_text(hjust = 1, size = 12)
       ) + coord_flip()
girafe(
  ggobj = p10,
  width_svg = 8,
  height_svg = 5,
  options = list(
    opts_hover(css = "fill-opacity: 0.8; stroke: black; stroke-width: 1px;")))
p11 <- journal.amlat %>%
  filter(Subjects %in% social_sc) %>%
  group_by(language) %>%
  count() %>%
  filter(n >= 10) %>%
  mutate(language = reorder(language, n)) %>%
  ggplot(aes(x = reorder(language, n), y = n, fill = language)) +
  geom_col_interactive(aes(
      tooltip = paste0("Idioma: ", language, "\nCantidad: ", n),
      data_id = language),
    show.legend = FALSE ) +
  theme_bw() +
  theme(legend.position = "none") + 
  scale_fill_viridis_d(option = "A") +
  labs(y = "Cantidad de publicaciones", x= "", 
       title = "Revista de Cs. Sociales y Humanidades según idiomas de publicación de América Latina",
       caption = add_caption()) +
       theme(plot.title = element_text(hjust = 1, size = 14, face = "bold"),
       axis.text.x = element_text(hjust = 1, size = 12),
       axis.text.y = element_text(hjust = 1, size = 12)
       ) + coord_flip()

girafe(
  ggobj = p11,
  width_svg = 8,
  height_svg = 5,
  options = list(
    opts_hover(css = "fill-opacity: 0.8; stroke: black; stroke-width: 1px;")))

Publicaciones según países y desagregadas por idiomas

p12 <- journal.amlat %>%
  group_by(country, language) %>%
  count() %>%
  ungroup() %>%
  filter(n >= 5 & n <= 410) %>%
  mutate(language = fct_reorder(language, n)) %>%
  plot_ly(x = ~country, y = ~language, z = ~n,
        type = "heatmap", colors = viridisLite::viridis(100, option = "C"), 
        hovertemplate = paste(
      "País: %{y}<br>",
      "Idioma: %{x}<br>",
      "Publicaciones: %{z}<extra></extra>")
    ) %>%
  layout(title = list(
      text = "Distribución según idiomas de publicaciones académicas de América Latina",
      x = 0.5,  
      xanchor = "center"
    ),
    xaxis = list(tickangle = -45, tickfont = list(size = 8)),
    yaxis = list(tickangle = -25, tickfont = list(size = 8)),
    margin = list(l = 100, r = 20, t = 50, b = 50), 
    annotations = list(
      list(
        x = 0.35, y = -0.58, 
        xref = "paper", yref = "paper",
        showarrow = FALSE,
        xanchor = "left",
        yanchor = "top",
        font = list(size = 8),
        text = add_caption()
      )
    )
  )

p12

Gráfico de dispersión con etiquetas de conteo

p13 <- journal.amlat %>%
  filter(
    Subjects %in% social_sc,
    !is.na(Ids), Ids != ""
  ) %>%
  group_by(country, Ids) %>%
  summarise(count = n(), .groups = "drop") %>%
  filter(count >= 2) %>%
  group_by(country) %>%
  mutate(prop = count / sum(count)) %>%
  ggplot(aes(x = country, y = count, fill = Ids)) +
  geom_col_interactive(position = "stack",
    aes(tooltip = paste0("País: ", country,
                         "\nIdentificador: ", Ids,
                         "\nCantidad: ", count,
                         "\nProporción: ",
                         scales::percent(prop, accuracy = 0.1)), 
        data_id = Ids),
    show.legend = FALSE
  ) +
  scale_y_continuous(trans = "log10", labels = scales::comma) +
  labs(
    title = "Proporción de identificadores por país",
    x = "País",
    y = "Proporción",
    fill = "Tipo de identificador", caption = add_caption()
  ) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

girafe(
  ggobj = p13,
  width_svg = 9,
  height_svg = 5.5,
  options = list(
    opts_hover(css = "fill-opacity: 1; stroke: black; stroke-width: 1.5px;"),
    opts_hover_inv(css = "opacity: 0.2;") 
  )
)
p18 <- journal.amlat %>%
  group_by(Publisher)%>%
  count()%>%
  filter(n >= 12)%>%
  mutate(Publisher = reorder(Publisher, n)) %>%
  ggplot(aes(x = reorder(Publisher, -n), y = n, fill = Publisher)) +  
  geom_col_interactive(
    position = "stack",
    aes(
      tooltip = paste0("Editorial: ", Publisher,
                       "\nCantidad: ", n),
      data_id = Publisher 
    ),
    show.legend = FALSE
  ) + 
  theme_minimal() +
  scale_color_scico(palette = "berlin", direction = -1) +
  labs(
    title = "Editoriales de las publicaciones en América Latina",
    subtitle = "Restringido a editoriales con más de 15 revistas",
    y = "Cantidad de revistas",
    x = NULL,
    caption = add_caption()
  ) +
  theme(legend.position = "none",
    plot.title = element_text(face = "bold", size = 14),
    axis.text.y = element_text(size = 9) 
  ) +
  coord_flip()

girafe(
  ggobj = p18,
  width_svg = 9,
  height_svg = 5.5,
  options = list(
    opts_hover(css = "stroke: white; stroke-width: 2px;"),
    opts_hover_inv(css = "opacity: 0.3;")
  )
)
p19 <- journal.amlat %>%
  mutate(License = str_replace_all(License, "'", "")) %>%
  group_by(License) %>%
  count() %>%
  filter(n >= 5) %>%
  mutate(License = reorder(License, n)) %>%
  ggplot(aes(x = reorder(License, n), y = n, fill = License)) +
  geom_col_interactive(
    aes(
      # tooltip
      tooltip = paste0("Licencia: ", License, "\nCantidad: ", n),
      data_id = License
    ),
    show.legend = FALSE
  ) +
  scale_color_scico_d(palette = "berlin") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5, face = "bold"),
    panel.grid.major.x = element_blank(),
    legend.position = "none") +
  coord_flip() +
  labs(
    title = "Licencias utilizadas por todo tipo de publicaciones en América Latina",
    y = "Frecuencias",
    x = "Tipo de licencias",
    caption = add_caption())

girafe(
  ggobj = p19,
  width_svg = 8,
  height_svg = 5,
  options = list(
    opts_hover(css = "fill-opacity: 1; stroke: black; stroke-width: 1.5px;"),
    opts_hover_inv(css = "opacity: 0.2;") 
  )
)
p20 <- journal.amlat %>%
  group_by(Review)%>%
  count()%>%
  filter(n>=8)%>%
  mutate(Review = fct_reorder(Review, n)) %>%
  ggplot(aes(x = reorder(Review, n), y = n, fill = Review)) +
  geom_col_interactive(
    aes(
      # tooltip
      tooltip = paste0("Proceso de revisión: ", Review, "\nCantidad: ", n),
      data_id = Review),
    show.legend = FALSE
  ) +
  scale_fill_scico_d(palette = "devon") +
  theme_minimal() +
  theme(legend.position = "n") +
  ylab("Frecuencia") +
  xlab("Tipo de revisiones") +
  ggtitle("Proceso de revisión para publicaciones para las revistas de América Latina") +
  coord_flip() +
  labs(caption = add_caption())

girafe(
  ggobj = p20,
  width_svg = 8,
  height_svg = 5,
  options = list(
    opts_hover(css = "fill-opacity: 1; stroke: black; stroke-width: 1.5px;"),
    opts_hover_inv(css = "opacity: 0.2;")))
p21 <- journal.amlat %>%
  group_by(Subjects) %>%
  count() %>%
  filter(n > 15) %>%
  #mutate(Subjects = fct_reorder(Subjects, n)) %>%
  ggplot(aes(x = reorder(Subjects, n), y = n, fill = reorder(Subjects, n))) +
  geom_col_interactive(
    aes(
      # tooltip
      tooltip = paste0("Áreas: ", Subjects, "\nCantidad: ", n),
      data_id = Subjects),
    show.legend = FALSE
  ) +
  scale_color_scico_d(palette = "devon") +
  coord_flip() + 
  theme_minimal() +
  theme(
    legend.position = "none",
    axis.text.y = element_text(size = 10)
  ) +
  ylab("Frecuencia") +
  xlab("Áreas") +
  ggtitle("Áreas generales de publicación en América Latina") +
  labs(caption = add_caption())

girafe(
  ggobj = p21,
  width_svg = 8,
  height_svg = 5,
  options = list(
    opts_hover(css = "fill-opacity: 1; stroke: black; stroke-width: 1.5px;"),
    opts_hover_inv(css = "opacity: 0.2;")
  )
)